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I.  IH  TBODUCTIOH 


One  of  the  factors  which  Halts  human  performance  is  the 
limited  capacity  of  haaan  memory.  Memory  is  commonly 
considered  to  be  divided  into  two  parts:  short-term  and 
long-term.  short-term  aeaory  is  that  part  which  we  can 
consciously  access;  it  may  be  compared  to  the  primary  store 
of  a  computer.  It  is  characterized  by  rapid  access  and 
volatility.  Long-term  memory  is  analogous  to  secondary 
storage  in  that  it  is  more  permanent  in  nature  than  short¬ 
term  memory  and  it  requires  more  time  and  effort  to  record 
information  to  and  retrieve  information  from  [1]. 

Short-term  memory  is  a  major  limiting  factor  on  human 
performance  because  it  is  the  aeaory  which  is  consciously 
accessible  and  thus  our  working  memory,  and  it  is  very 
limited  in  its  capacity.  This  memory  holds  units  of  infor¬ 
mation  for  up  to  thirty  seconds.  That  period  may  be 
extended  through  repetition  and  rehearsal.  The  size  of 
short-term  memory  is  approximately  seven  units  of  informa¬ 
tion  (plus  or  minus  two).  The  nature  of  these  units  is  a 
function  of  experience  and  training.  For  example,  someone 
familiar  with  English  may  find  it  easy  to  remember  seven 
English  words  but  difficult  to  remember  seven  Chinese  ideo¬ 
grams.  Thus  it  is  easy  to  see  that  the  information 
processing  capacity  of  humans  can  be  easily  overloaded. 
Long  term  memory  limits  performance  because  of  the  time  and 
effort  associated  with  fetches  from  and  stores  to  it  [1]. 

The  idea  behind  a  Personal  Database  Management  System 
(PDBHS)  is  to  provide  an  extension  to  both  short-term  memory 
and  long-term  memory.  A  good  PDBHS  should  provide  its  users 
with  means  of  storing  information  and  later  retrieving  it 
that  are  faster  and  more  efficient  than  ordinary  human 
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means.  Long-term  memory  can  be  extended  by  allowing  users 
to  easily  store  information  which  they  find  difficult  to 
memorize.  Numerical  information  such  as  phone  numbers,  safe 
combinations,  and  part  numbers  are  examples  of  information 
which  are  usually  expensive  in  the  aaount  of  effort  required 
to  ensure  that  they  are  not  soon  forgotten.  Short-term 
memory  can  be  extended  by  providing  users  with  a  way  to 
relieve  the  burden  upon  its  capacity.  Instead  of  having  to 
remember  a  piece  of  information  or  a  key  (or  cue)  to 
retrieving  the  desired  information,  a  PDBHS  can  accept  the 
key  as  input  and  retrieve  the  desired  information.  Once  the 
key  has  been  entered  into  the  system,  it  may  be  forgotten, 
freeing  a  portion  of  short-term  memory  for  more  information. 
Also,  retrieved  information  need  not  be  memorized  if  the 
PDBHS  records  if  in  a  manner  which  allows  it  to  be  easily 
accessed.  For  example,  information  recorded  on  a  piece  of 
paper  or  on  a  display  screen  need  not  be  memorized  if  it  is 
within  easy  reach. 

tihat  should  be  the  characteristics  and  what  are  the 
requirements  of  a  Personal  Database  Hanagement  System? 
Because  it  is  designed  for  the  storage  and  retrieval  of 

pepsppal  information,  it  is  a  single-user  system.  In  order 

to  be  useful  to  a  broad  range  of  people,  it  should  permit 

interaction  at  different  levels,  depending  on  the  sophisti¬ 

cation  of  the  user.  Novice  users  will  be  easily  discouraged 
and  see  very  little  benefit  if  a  system  appears  to  be  illog¬ 
ical  and  complicated.  Also,  because  of  the  personal  nature 
of  the  information  in  the  database,  the  system  should 
provide  security  to  that  information.  Finally,  in  order  to 
be  acceptable,  it  should  be  small,  light-weight,  and 
inexpensive. 

This  last  requirement  was  taken  to  indicate  that  such  a 
system  should  be  built  using  a  battery-driven  micropro¬ 
cessor.  Current  microprocessor  technology  provides  more 


computer  power  than  is  needed  strictly  for  a  PDBHS.  So  the 
design  presented  here  incorporates  the  following  additional 
capabilities:  1)  the  ability  to  be  used  as  a  calculator*  2) 
the  ability  to  be  programmed  by  the  user*  and  3)  the  ability 
to  be  connected  into  networks  or  to  other  devices  via  an 
RS232  serial  interface. 

The  PDBHS  is  programmed  in  a  non-standard  version  of 
FORTH.  The  particular  one  used  here  is  neither  fig-FORTH 
nor  FORTH-79*  the  two  most  prevalent  versions  of  FORTH. 
However*  the  basis  for  the  language  used  is  8080  fig-FORTH* 
version  1.3*  which  was  partially  modified  to  conform  with 
the  FORTH-79  standards  [2].  Further  modifications  were  made 
to  this  based  upon  hardware  characteristics*  and  the  sugges¬ 
tions  and  ideas  of  various  members  of  FORTH  Interest  Group. 
In  spite  of  this*  when  referred  to  in  this  thesis*  the 
language  used  in  the  PDBHS  will  be  called  FORTH.  One  major 
distinction  should  be  made*  however*  the  PDBHS 's  base  vocab¬ 
ulary  is  called  ROOT*  not  FORTH. 
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ii.  ssassm  nhuzm 


A.  BACKGR001D 

The  largest  part  of  the  lnforaation  presented  in  this 
chapter  was  derived  froa  detailed  study  of  four  personal 
address  boohs  (Appendix  B  contains  detailed  statistics  froa 
this  study).  Address  books  were  used  as  a  basis  for  the 
preliainary  investigation  of  personal  databases  because  they 
were  found  to  be  aore  structured,  standardized,  and  easily 
coaputerized  than  other  personal  databases  (e.g.,  shopping 
lists,  appointaent  calendars,  and  thing s-to-do  lists). 

The  people  (soae  of  whoa  worked  with  coaputers  daily) 
interviewed  during  the  study  indicated  that  the  aaintenance 
of  personal  databases  is  not  analogous  to  aanageaent  of 
databases  by  coaputer.  Indeed,  the  ways  in  which  a  database 
aanageaent  systea  (DBAS)  is  structured,  aaintained,  and  used 
is  very  different  froa  the  way  people  aanage  their  personal 
inf or nation.  The  results  of  the  author's  studies  and  inter¬ 
views  seea  to  indicate  that  the  essential  difference  between 
DBMSs  and  personal  infornation  aanageaent  is  the  nunber  of 
"systea"  users.  It  is  this  difference  that  is  the  apparent 
cause  of  aost  all  of  the  other  differences. 

Because  DBMSs  are  noraally  organizational  tools  with 
aany  users,  records,  fields,  attribute  values,  query 
languages,  keys,  etc.,  they  aust  be  standardized.  Because 
organizational  data  is  entered  and  retrieved  by  aany 
different  individuals  and  thus  without  standardization,  it 
would  be  difficult  for  one  person  to  know  of  infornation 
entered  into  the  systea  by  another,  auch  less  retrieve  it. 
On  the  other  hand,  personal  infornation  is  shared  by  only  a 
few  people,  if  any.  An  iaportant  point  here  is  that  in  such 
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a  situation  where  there  is  only  one  user,  that  user  kno‘*s 
(or  knew  at  one  tiae)  alj,  ot  the  inforaation  in  the  systea 
because  entered  it.  People  record  and  aaintain  personal 
inforaation  in  an  auxiliary  store  in  order  to  relieve  then- 
selves  of  sone  of  the  burdens  of  recall  and  recognition. 
Because  long-tera  aeaory  is  generally  considered  to  be 
peraanent  [1],  the  data  recorded  in  auxiliary  stores  need 
not  be  a  verbatia  copy  of  the  inforaation  which  is  to  be 
retrieved  later.  Truly  personal  inforaation  needs  only  to 
contain  enough  context-specific  cues  to  enable  a  person  to 
reconstruct  or  recall  the  structure  of  their  seaantic 
aeaory. 

MThe  Recognition  of  Previous  Encounters,"  by  George 
Handler  (3]  describes  seaantic  structures  as  an  organization 
of  aeaory  (referred  to  as  a  "faailiarity  variable").  These 
structures  represent  the  faailiarity  of  events  (and  of  the 
entities  which  are  part  of  an  event) ,  and  are  unique  to  each 
particular  event.  Further,  they  are  independent  of  the 
context  in  which  the  event  occurs  or  in  which  it  is 
eabedded.  Two  sets  of  independent  processes  operate  upon 
seaantic  structures:  intra-event  processes  which  are 

referred  to  as  "integration,"  and  inter-event  processes 
which  relate  an  event  to  others  called  "elaboration." 
Handler's  hypothesis  is  that  recognition  is  related  to  inte¬ 
gration,  which  is  developed  through  attentive  repetition 
(rote  learning).  Becall  is  related  to  elaboration,  which  is 
strengthened  by  the  establishaent  of  relational  links 
between  the  target  event  and  other  representations  in 
aeaory1.  Handler  does  not  describe  how  integration  and 


1  Recognition  is  theL process  of  going  froa  a  faailiar 
event  to  the  context  which  caused  the  event  to  be  reaea- 
bered.  Recall  is  the  opposite  process,  that  is,  reaeabering 
an  event  froa  its  context.  When  a  person  atteapts  to 
reaeaber  where  he  knows  a  faailiar  face  froa,  he  is 

Iaploylng  recognition.  Recall  is  what  a  person  atteapts  to 
o  when  fie  knows  his  wife  told  hia  ~~  - "  _  -  * 


way  hoae,  but  has  forgotten  what. 


to  get  soaething  on  the 
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elaboration  aanifest  theas elves  except  in  an  abstract  way. 
They  east  involve  the  est ablisheent  of  cues  vhich  act  as 
keys  to  semantic  structures  whether  they  eight  be  direct  (as 
one  would  expect  in  the  case  of  integration)  or  indirect  (as 
eight  be  the  case  for  elaboration)  access.  It  is  these  cues 
which  eust  be  available  to  a  person  in  order  to  retrieve  the 
desired  events  and  entities.  It  is  this  that  eakes  personal 
databases  different  froe  DBBSs. 

Even  though  only  the  einieue  nueber  of  cues  need  be 
saved  in  order  to  retrieve  inforaation,  the  author's  studies 
revealed  that  usually  sore  than  the  einieue  required  cues 
are  recorded.  ?or  exaeple,  there  is  usually  no  need  to 
record  one's  parents'  city  and  state  of  residence,  yet  every 
address  book  contained  this,  as  well  as  other  unnecessary 
inforaation.  This  is  probably  due  in  part  to  the  fact  that 
address  books  are  not  always  personal  databases,  soaetiaes 
they  are  faeily  documents.  Appoint Bent  calendars  appeared 
to  be  the  tersest  of  all  the  personal  databases  studied.  An 
exaeple  entry  for  Harch  10  eight  be,  "Rebecca  11:30"  vhich 
is  a  reainder  that  Rebecca  has  an  appointaent  with  Dr. 
Feeney  at  the  Pediatric  Group,  698  Cass  street,  11:30  A.  a., 
on  March  10th. 

In  order  to  establish  a  coaaon  ground  for  coaparison, 
the  following  teres  will  be  used  throughout  this  thesis. 


inforaation  eanaged  by  this  systee  is  organized  into 
files  containing  records. 

*  Manual  Database  fBDB)  :  a  aanually  saint ained  file  of 

personal  inforaatlonT  .  Because  these  databases  are 
noreaily  not  systeaatically  eanaged  as  a  group,  there  is 
no  HDBBs  analogous  to  a  PDBSS.  Each  BDB  is  separate  and 
distinct  froe  all  other  BDBs:  an  address  book,  appoint* 
sent  book,  etc.,  are  each  BDBs. 


m 


related 


a  relationship  between  records.  An  BDB  is  a 
All  records  in  a  file  are  of  the  sane  foraat  and 
by  the  their  grouping  into  the  saae  rile. 
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ss 


*  V 

"addre 


an  entry  in  a  file..  In  an, address, book  each 

Krson.  or  an  organization  is  added  to  the 
ok  file,"  a  new  record  is  added. 


mt 


an  entry  in  a  record, 
saee  file  have  the  sane 
tore).  In  an  address  book, 
called  "name,"  "street,"  "city, 
and  "telephone  nuaber. " 


, n  general,  all  records 
fields  Jana  thus  struc- 
the  fields  are  usually 
state,  and  zip  code," 


B.  GEI2BAL  CHAI1CTEHISTICS 

As  stated  before,  people  do  not  generally  view  personal 
data  as  a  database  in  the  sane  sense  as  information  in  a 
computerized  database.  Bach  BDB  tands  to  be  viewed  as  a 
distinct  entity,  unrelated  to  any  other  BOB.  Thus  there  is 
no  notion  of  a  database  management  3vstem  (DBBS)  since  the 
HDB s  are  not  managed  together  as  a  group.  As  a  result  there 
is  often  redundant  information  in  HDBs  when  they  are  viewed 
as  a  group.  For  example  an  address  book  and  an  appointment 
calendar  probably  both  contain  redundant  information  about 
an  individual's  insurance  agent,  realtor,  dentist,  etc. 
Even  though  the  possibility  for  joins  and  Cartesian  products 
exists,  they  are  not  only  not  performed,  but  the  concepts 
behind  these  operations  are  apparently  incomprehensible  to 
the  layman. 

The  existence  of  separate  BOB' s  or  files  can  be  intui¬ 
tively  explained  by  three  reasons.  First,  and  most 
obviously,  is  that  the  amount  of  effort  required  to  maintain 
even  a  partially  integrated  database  manually  costs  more 
than  the  value  gained  by  having  such  a  database. 
Maintaining  such  a  database  requires  the  establishment  of 
all  possible  desired  relationships  before  the  implementation 
of  the  database  followed  by  the  maintenance  of  complicated 
and  troublesome  cross-indexes.  Less  effort  is  required  to 
check  one's  appointment  book  for  appointments  and  then  go  to 
one's  address  book  to  obtain  the  phone  number  to  call  in 
order  to  confirm  an  appointment;  or  if  the  requirement  for  a 


confirmation  was  foreseen,  to  simply  duplicate  the  phone 
number  in  the  appointment  book. 

The  second  reason  is  aore  subtle  and  might  be  related  to 
the  ideas  expressed  in  reference  [3].  Even  though  the  same 
entity  (person,  organization,  etc.)  nay  be  included  in  aore 
than  one  file,  the  different  occurrences  may  represent 
different  views  of  that  same  entity;  that  is,  file  entries 
are  context-sensitive.  When  comparing  address  book  records 
to  appointment  calendar  records,  it  is  very  common  to  find 
that  the  address  book  entry  for  an  individual  is  more  formal 
than  an  appointment  book  entry  for  the  same  individual.  For 
example  "Bichard  Elton"  night  appear  as  "Richard  and  Hay 
Elton"  in  an  address  book,  "Rich"  in  an  appointment  book, 
and  "It.  Elton”  in  a  personal  note.  This  context-sensitive 
nature  of  entries  seems  to  indicate  that  integrating  a 
personal  database  is  much  aore  difficult  than  in  the  case  of 
traditional  DBHSs. 

The  last  reason  is  that  inconsistencies  between  personal 
HDBs  (i.  e. ,  files)  due  to  replication  (redundancy)  of  data 
is  easily  managed.  This  is  not  only  because  of  the  indi¬ 
vidual  and  aggregate  file  sizes,  but  also  because  of  the 
nature  of  the  data.  The  issue  of  size  is  obvious;  the 
important  characteristic  of  the  data  which  aids  in  solving 
the  problems  of  inconsistency  is  that  the  keys  used  for 
access  are  closely  related,  if  not  identical,  to  cues  used 
to  reconstruct  semantic  structures.  For  example,  when  a 
person  receives  a  change  to  his  friand  Pat's  phone  number, 
it  will  probably  prompt  him  to  take  a  change  in  his 
address/phone  book.  What  changed  was  not  the  entity  "Pat" 
but  just  a  value  of  one  of  the  entity's  attributes.  So  for 
the  most  part,  the  cues  (which  are  context-free)  associated 
with  "Pat”  remain  unchanged.  Therm  is  a  good  possibility 
that  all  occurrences  of  the  old  phone  number  will  not  be 
updated.  Later  when  he  comes  across  an  occurrence  of  the 


old  nuaber,  it  vill  elicit  tan y  of  the  saae  cues  related  to 
"Pat"  as  vould  the  address  book  entry.  Chances  are  that  he 
vill  reaeaber  that  the  nuaber  was  changed  and  was  recorded 
in  his  address/phone  book.  It  vill  be  then  that  the  incon¬ 
sistency  vill  be  corrected,  if  it  is  at  all.  Perhaps  people 
rely  upon  this  and  intentionally  do  not  sake  any  great 
effort  to  seek  out  inconsistencies. 

1.  &l££ 

annually  aaintained  files  are  apparently  organized 
in  tvo  vays:  sequential  access  and  direct-keyed  access. 
HDBs  vhich  are  direct-key  accessed  are  norsally  recorded  in 
a  coaiercially  procured  file  or  docuaent.  Exaaples  of  these 
files  are  address  books  vhich  are  designed  to  be  keyed  on 
the  first  letter  of  a  surnaae  in  the  "naaeN  field  or 
appointaent  books  vhich  are  designed  to  be  keyed  on  a  date. 
Sequentially  aaintained  files  are  coaaonly  kept  on  less 
rigidly  structured  aedia  such  as  notepads,  chalk  boards,  or 
scraps  of  paper.  Inforaation  is  usually  entered  chronologi¬ 
cally.  Shopping  lists,  things-to-do  lists,  etc.,  are 
exaaples  of  sequentially  organized  files.  another  distinc¬ 
tion  betveen  the  tvo  file  types  is  the  tiae-value  of  the 
inforaation  stored  in  thea.  Indexed  files  usually  contain 
inforaation  vhich  is  to  be  retained  for  a  longer  period  of 
tiae  than  that  contained  in  sequential  files.  It  vas  not 
uncoaaon  to  find  address  book  entries  vhich  vere  more  than 
ten  years  old. 

2.  Becords 

With  the  exception  of  personal  notes,  records  vithin 
any  particular  file  tended  to  be  fairly  uniforaly  foraatted. 
There  is  generally  a  core  of  fields  vhich  contain  a  value  in 
alaost  all  records.  Bovever  aany  records  contained  addi¬ 
tional  fields  beyond  the  "core- fields."  In  the  case  of 


address  books  these  fields  were  inserted  into  the  pre¬ 
printed  record  forsats  by  writing  thee  vertically,  placing 
thee  in  an  unused,  unrelated  field,  or  placing  thee  into 
another  record.  The  "cor  e- fields'*  in  address  books  are: 
"naee,"  "street,"  "city,"  "state,"  "zip  code,"  "area  code," 
and  "telephone  exchange  and  nuaber."  Typical  additional 
fields  contain  inforaation  such  as: 

•  account,  Hodel,  Serial,  Policy,  and  Social  Security 
Huabers. 


•  Additional  Phone  Huabers  (e. g.,  "hoae,"  "work," 
"aarketing  depart aent,"  "service,"  "account  inquiries," 
etc.) . 


•  Birthdays  and  anniversaries. 


•  additional  Haaes  (e.g. ,  children's  naaes,  points  of 
contact) . 


•  Cards  and  Favors  Sent  and  Beceived. 


•  additional  flisce  11a neons  Znforaation  (e.g.,  "then  in 
Seattle,"  "neighbors  in  Honterey,"  or  "Uncle  Bob's 
brother-in-law"). 


In  the  case  of  address  books,  record  deletion 
appears  to  be  an  unpredictable  event  and  probably  a  function 
of  the  aediua  upon  which  it  is  recorded.  Bound  address 
books  contain  aany  sore  entries  whose  validity  are  question¬ 
able.  Hany  of  these  appear  to  be  retained  not  only  because 
they  were  entered  in  ink,  thereby  Baking  deletion  a  aessy 
affair,  but  for  sentiaental  reasons.  Bany  of  the  very  old 
entries  are  for  high  school  and  childhood  friends.  Address 
books  which  perait  easy  deletion  of  records  appear  to 
contain  fewer  old  entries,  but  because  deletions  are  not 
recorded  it  is  not  easy  to  attribute  this  effect  to  the  ease 
of  deletions. 
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3-  Zi&l&s 


Even  though  the  fields*  types  and  nuabers  appear  to 
be  fairly  standardized,  the  contents  of  the  fields  is  not. 
Fields  appear  to  be  variable  length  with  no  restriction  on 
content.  Graphic,  n on- alp  ha nu seric  symbols  such  as  hearts, 
check-narks,  and  Mhappy  faces"  are  not  uncoaaon.  Soae  files 
contain  indicators  of  the  validity  of  the  inforaation  in  the 
field  (e.g.,  "?"  or  "as  of  Dec  81").  Abbreviations  are  not 
consistently  used  in  the  saae  file;  for  ezaaple,  one  address 
book  ezaained  contained  all  of  the  following  entries: 


Street 

St. 

Str. 

Avenue 

Ave . 

Virginia 

virg. 

Va 

VA 

dr.  &  Mrs. 

Mr/ Mrs 

Mr. 

and  Mrs. 

C.  DESIGN  IMPLICATIONS 

It  appears  obvious 

that  a  PDBMS 

and 

a  DBMS  are 

not 

the 

saae.  As  such,  it 

is  reasonable 

I  to 

construct  a 

PDBMS 

differently  froa  a  DBMS 

.  Because  a 

PDBMS  is  used  as 

an 

aid 

to  recall  contexts  froa  aeaory,  and  the  cues  to  these  are 
unigue  to  each  context  [3],  not  only  should  the  systea  have 
no  restrictions  such  as  fixed  field  lengths  and  attribute 
values,  but  additionally  it  should: 


•  Allow  the  user  to  use  jjiy  word  as  a  key. 

•  Be  able  to  recognize  and  coapensate  for  aisspelled  keys. 


Be  able  to  take  into  account  keys  which  are  synonyas  and 
refer  to  the  saae  entity  (for  exaaples  see  tne  descrip¬ 
tion  of  fields,  above)  .  Also  it  should  have  the  ability 
to  discriainate  between  hoaonyas  which  appear  to  be  the 
saae  but  refer  to  different  attributes  or  entities  (for 
exaaple,  "CT,"  as  an  abbreviation  for  "Court"  in  a 
street  address  versus  "CT,"  as  an  abbreviation  for 
"Connecticut") . 
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Shell  interviewing  laymen,  it  was  found  that  they  easily 
understand  the  concepts  of  "file"  and  "record,1’  but  not 
"field."  This  suggests  that  perhaps  people  conceptualize  an 
entity  as  a  synergistic  sum  of  its  attributes  rather  than  as 
a  relationship  between  attributes.  Thus  a  record  is  the 
smallest  logical  unit  with  which  people  normally  deal 
because  it,  as  a  whole,  contains  the  cues  necessary  to 
reconstruct  semantic  structures.  The  number  of  fields  in  a 
record  may  be  related  to  an  individual’s  ability  to  "inte¬ 
grate"  the  corresponding  semantic  structure  [3]. 

Because  a  PDBHS  is  an  aid  to  an  individual's  recall,  it 
should  faithfully  preserve  information  entered  and  retrieve 
it  by  logical  means.  If  text  compression  or  compaction2  is 
employed  it  must  be  transparent  to  the  user.  Logical 
retrieval  means  that  if  the  user  feels  that  he  has  given 
sufficient  information  to  specify  the  desired  data,  the 
system  should  be  able  to  either  retrieve  the  data  or  give  a 
comprehensible  reason  why  it  could  not  be  retrieved. 

A  PDBHS  should  be  "user  friendly"  and  require  very 
little  effort  on  the  part  of  the  user.  This  means  that 
persons  who  have  no  need  or  desire  to  understand  computers, 
DBHSs ,  etc.,  should  be  able  to  use  the  system.  Further, 
file,  record,  and  field  formats  should  be  easily  specified 
without  the  need  for  a  plethora  of  technical  details.  Entry 
and  retrieval  of  data  should  also  be  fast  and  easy.  Host 
people  who  are  not  specifically  trained  on  -omputers  tend  to 
have  much  less  tolerance  for  poorly  engineered  computer 
systems  or  ones  requiring  a  technical  expertise  than  do  the 


2T**t  compression  and  compaction  involve  removing .redun¬ 
dant  information  from, text  so  that  it  can  be  stored  using 
fewer  resources  than  if  the  original  text  had  been  stored. 
The  difference  between  the  two  is  that  an  exact  copy  of  the 
original  text  is  recoverable  after  compression,  whereas  it 
is  not  from  compaction. 


in.  usa  L am  sasas  sxsxss  aasgumoa 


A.  SOFTWARE 

When  the  user  first  receives  the  PDBHS,  he  sees  only  tvo 
functions:  a  calculator  and  a  database  oanageaent  systea. 
As  the  user  learns  hov  the  systea  works,  it  is  possible  for 
hia  to  expand  the  systea  i ncreaentally  until  eventually  he 
can  reprograa  a  large  portion  of  the  systea  itself  in  FORTH 
and/or  asseably  language. 

Sany  of  the  keys  on  the  PDBHS's  keyboard  are  progran- 
aable.  They  are  initially  used  to  allow  the  user  to  enter 
coaaands  by  siaply  pushing  a  key.  Instead  of  typing 
"RECORD"  when  using  the  database  aanageaent  function,  the 
user  needs  only  to  push  the  "SHIFT"  and  "R"  keys  and  the 
systea  will  enter  the  word  "RECORD"  for  hia. 

1.  UlS  Calculator  Function 

The  calculator  which  the  usar  initially  receives  is 
auch  like  any  other  calculator.  Two  aajor  ways  in  which 
this  function  differs  froa  aost  standard  calculators  is  that 
a  series  of  arithaetic  operations  say  be  entered  at  once, 
and  that  the  user  nay  create  and  use  variables.  Unlike  aost 
calculators,  the  action  of  aost  of  the  keys  on  the  PDBHS  is 
siaply  to  enter  textual  data  into  the  systea.  The  PDBHS 
does  not  interpret  aost  of  the  input  until  the  EHTER  key  is 
pressed.  So  the  following  two  key  sequences  have  the  sane 
effect,  i.e.,  to  add  two  to  three  and  obtain  five. 
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<enter> 


<space> 


<enter> 


<space> 


<enter> 


<space> 


<enter> 


<enter> 


Like  in  FORTH  AH,  variables  are  created  when  they  are 
first  used.  If  a  word  or  a  character  is  found  in  the  input 
which  the  calculator  cannot  recognize  and  it  is  to  the  right 
of  an  egual  sign,  it  assuaes  that  it  is  a  variable  declare* 
tion  and  creates  one.  If  an  unrecognizable  word  or 
character  is  encountered  to  the  left  of  an  egual  sign,  an 
error  condition  is  signalled. 

2.  Thq  Database  Manage  aent  Function 

The  database  aanageaent  function  allows  the  user  to 
create  files  and  records,  delete  files  and  records,  retrieve 
records,  and  use  keys  (i.e.  ,  passwords)  to  seal  records  and 
other  keys  as  a  Beans  of  providing  data  security.  The  user 
is  not  required  to  deal  directly  with  the  technicalities  of 
database  data  structures,  he  only  needs  to  know  that  files 
are  a  collection  of  records,  all  having  the  saae  foraat. 
Piles  appear  to  the  user  to  be  separate  and  disjointed, 
siailar  to  MDBS.  The  procedure  for  creating  a  file  requires 
only  that  the  user  specify  the  file's  naae  and  the  naaes  of 
the  fields  within  the  records  of  the  file.  The  user  is  led 
through  the  process  of  file  creation  and  record  retrieval  by 
systea  proapts. 

Records  aay  be  retrieved  by  using  any  word  (or  group 
of  words)  contained  within  then.  The  only  restriction  on 
this  is  that  the  user  aust  specify  which  field  is  to  be 


searched  for  the  target  word(s).  This  restriction  should 
not  seen  unnatural  to  the  user  but,  rather,  necessary. 
Because  any  word  is  a  possible  key  attribute,  the  user  Bust 
be  able  to  specify  the  context  of  the  target  word.  By  spec¬ 
ifying  the  field  naae  with  queries,  the  user  is  able  to 
retrieve  a  record  using  Hr.  York's  last  naae  without  also 
retrieving  all  of  the  records  containing  "New  York." 

B.  DATA  STRUCTURES 

The  PDBHS  uses  soae  data  structures  which  eight  be 
considered  unusual  when  coa pared  to  other  database  applica¬ 
tions.  Soae  of  these  are  characteristic  of  FORTH  and  others 
are  used  because  of  the  nature  of  the  systea. 

1 •  Dictionaries 

Two  different  dictionary  structures  are  used  in  the 
PDBHS.  One  dictionary  is  that  which  is  associated  with 
FORTH.  The  second  is  conceptually  aore  like  a  dictionary, 
as  a  layaan  night  think.  A  PORTH  dictions.  ^  is  siaply  a 
linked  list  of  PORTH  definitions.  The  definitions  are  nain- 
tained  in  chronological  order  by  their  tiae  of  creation. 
These  definitions  typically  describe  the  following  basic 
FORTH  word-types:  colon  definitions,  constants,  variables, 
user  variables,  and  vocabularies.  Colon  definitions  are 
FORTH  definitions  which  are  defined  in  teras  of  previously 
created  definitions,  siailar  to  procedures  and  functions  in 
other  languages.  Vocabularies  are  "sub-dictionaries"  and 
are  used  to  deliait  the  scope  of  definitions. 

The  other  dictionary  is  called  the  DB  dictionary  and 
it  is  used  to  store  the  words  entered  and  contained  in  the 
database.  Sords  are  entered  into  the  dictionary  and 
looked-up  by  hashing  to  a  linked  list  using  the  first  letter 
or  digit  of  the  target  word,  and  then  traversing  the  list. 
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which  is  alphabetically  sequenced.  Punctuation  is  not 
stored  in  the  DB  dictionary. 

2.  Fiis§ 

Files  are  coapletaly  inverted.  They  contain  only 
adainistrative  data,  and  indices  and  pointers  into  the  DB 
dictionary.  Inforaation  which  is  retrieved  froa  the  data¬ 
base  is  reconstructed  a  word  at  a  tiae  by  looking  words  up 
in  the  dictionary  (punctuation  is  stored  directly  in  the 
database  in  its  ASCII  foraat).  Heaory  for  files,  the  DB 
dictionary,  and  sealed  keys  (discussed  later)  are  allocated 
froa  a  heap  so  that  none  of  these  data  structures  occupy 
contiguous  aeaory.  i  fils  is  defined  as  a  FOBTH  vocabulary 
and  its  definition  contains  pointers  to  the  first  and  last 
records  in  the  file.  Becords  are  aaintained  as  a  circular, 
doubly  linked  list.  The  fields  are  defined  as  FOBTH 
constants  in  their  respective  fils's  vocabulary.  Their 
value  is  an  ID  nuaber  which  is  used  to  relate  the  fields  in 
the  database  to  the  naaes  assigned  to  then  by  the  user. 

3.  iflaisal  8££2£& 

To  the  user  a  record  appears  to  be  a  collection  of 
inforaation  related  to  a  particular  sntity.  The  fields  help 
to  organize  the  data  by  grouping  it.  The  logical  record 
itself  is  variable  in  length.  The  first  set  of  bytes  in  a 
record  contain  the  record's  access  descriptor,  which  is 
variable  in  length.  This  is  followed  by  the  links  (or 
pointers)  to  the  previous  and  next  records  in  the  file. 
Following  these  pointers  are  the  fields  which  are  fixed  in 
nuaber  (as  deterained  in  the  file's  definition),  but  are 
each  variable  in  length.  Fields  are  separated  by  an 
end-of-field  (EOF)  aarker.  Because  records  contain  a  fixed 
nuaber  of  fields,  the  last  EOF  serves  as  a  end-of-record 
aarker. 
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Fields  are  a  continuous  string  of  bytes  which  repre¬ 
sent  the  data  contained  in  the  field.  Punctuation  appears 
in  its  ASCII  format  (one  character  per  byte) .  Words  are 
represented  by  two  bytes,  the  first  contains  the  word's 
initial  letter  (or  digit)  which  is  used  to  hash  into  the  DB 
dictionary.  the  second  byte  is  a  a  unbar  used  to  identify 
the  particular  neaber  of  the  linked  list  hashed  to  repre¬ 
senting  the  target  word. 

5.  Revs 

Keys  nay  be  thought  of  as  passwords  which  are  used 
to  secure  records,  FORTH  screens,  and  other  keys  (called 
sealed  keys).  These  objects  (i.e.,  records,  screens,  and 
keys)  all  have  access  descriptor  fields  which  contain  infor- 
nation  about  what  keys  are  necessary  to  access  the 
particular  object.  Keys  allow  the  user  to  construct  fairly 
con  pi  ex  access  nechanisas. 

C.  HARDWARE 

Figure  3.1  is  a  simple  picture  of  the  layout  of  the 
PDBES's  hardware.  The  systea  makes  extensive  use  of  CHOS 
technology  so  that  it  can  be  battery  driven.  There  are  six 
aajor  components  in  the  system. 

i.  Stasaials  gsagcaiflafeis  asagsi 

Erasable  programmable  read-only  memory  (EPROH)  occu¬ 
pies  the  systea's  low  aeaory  and  contains  the  PDBHS's 
operating  systea.  There  are  16K  bytes  of  EPROB  in  the 
systea.  As  its  naae  implies,  its  contents  cannot  be  altered 
by  the  user. 


Figure 


sis 

Nrt 


8K 

EEPROM 


. 1  POBHS  Hardware  Configuration. 
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2.  fi&JliSl  i£S£SS  flSI 2CI 


Fandom  access  aeaory  (BAH)  is  used  by  the  user  as 
his  workspace.  systea  parameters  and  data  structures  which 
change  according  to  the  runtiae  environment  are  also  aain- 
tained  in  BAB.  There  are  16K  bytes  of  BAB. 

3.  sias&ilg&llx  snaifcls  Etgacaiiakle  ssa^2ali  flaisi i 

Electrically  erasable  prograaaable  read-only  aeaory 
(EEPBOB  or  E*PROB)  serves  as  the  systea' s  secondary  storage. 
The  unique  characteristic  of  E*PROB  is  that  it  can  be  erased 
(i.  e.  ,  written  into)  under  software  control,  as  BAB  can,  but 
it  is  non-volatile  (i.e. ,  its  contents  are  not  lost  when  the 
power  is  turned  off).  Part  of  the  B*PROB  is  not  accessable 
to  the  user  because  it  is  used  by  the  systea  for  B*PBOB 
aeaory  aanageaent,  and  database  aanageaent  and  storage, 
what  is  not  used  by  the  systea  is  available  to  the  user  as 
FOBTH  screens. 

4.  usjiid  &L221&1  Biielax  saifesaii 

The  liquid  crystal  display  (LCD)  serves  as  the 
system's  console.  It  contains  two  rows  of  20  characters. 
It  is  attached  directly  to  the  system's  bus  and  any  data 
written  into  aeaory  beginning  at  address  C000H  appears  on 
the  LCD.  The  keyboard  provides  the  means  by  which  the  user 
can  directly  input  data  into  the  systea.  It  is  connected  to 
the  system's  bus  via  a  parallel  I/O  port. 

5.  Central  Processing  On it 

The  PDBBS  uses  an  NSC800  microprocessor  operating  at 
a  clock  rate  of  1  Bz.  This  is  a  CBOS  aicroprocessor  which 
is  downwardly  coapatible  with  the  Z80.  It  was  chosen  as  the 
system's  CPO  because  of  its  low  power  consumption  and  the 
availability  of  software.  The  slow  speed  is  not  an  issue 


with  this  system  because  of  the  naturally  slow  nature  of 
hum  an -computer  communications. 

6.  B5 IZ2  £2£l 

This  port  allows  the  user  to  interface  his  systes 


with  other  systems. 
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A.  COIVBITZOIS  AID  IOTATIOH 

The  nature  of  words  in  POSTS  doss  not  land  thss  to  bs 
rsfsrrsd  to  by  snclosing  thss  in  quotas,  so  instaad  thay 
will  appaar  in  upper-case  boldface.  However,  bacausa 
boldface  punctuation  is  often  hard  to  distinguish  fros 
standard  text  punctuation,  the  following  eight  FORTH  words 
will  be  enclosed  in  braces: 

•  •  t  ?  i  it 

Additionally  FORTH  words  coaposad  entirely  of  strings  of 
these  characters  will  be  enclosed  in  braces  (for  exasple, 
{-">). 

Finally,  to  avoid  aabiguity,  the  following  conventions 
will  be  used  when  using  the  three  words  "hey,"  "word,"  and 
"dictionary."  Hhen  there  is  a  possibility  of  confusing  the 
FORTH  leaning  of  "word"  (described  below)  and  the  accepted 
cosputer  ters  "word"  (i.  e. ,  two  bytes  or  16  bits  on  the  8080 
and  Z80  sicroconputers) ,  the  forier  "word"  will  be  called  a 
"word"  or  a  "FORTH  word,"  whereas  the  latter  "word"  will  not 
be  used,  instead  "two  bytes"  will  be  used.  Adding  further 
possibilities  for  confusion  is  the  third  leaning  of  "word." 
This  third  leaning  is  the  usual  English  connotation  of 
"word"  and  these  "wordjr"  are  data  in  the  PDBBS.  The  ubiqui¬ 
tous  FORTH  response,  "OK,"  and  words  entered  by  the  user  as 
responses  to  the  systei  proipts  and  as  data  to  be  included 
into  the  database  are  "words"  in  this  third  class.  Data 
words  of  this  type  will  be  called  "uwords."  Because  uwords 
entered  into  the  database  aay  be  altered  before  they  are 
entered  into  the  database  dictionary,  the  words  which  reside 
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uvord  ::* 

<wordd><punctuation>  |<punctuation> 

punctuation  ss* 

#  l«  I/I  ♦  |*|  - |<space> |3 1  (|)  I : 

% 

•  mm  d^Ce 

space  ::* 

2  OH 

wordd  is* 

<wordd><char>|<char> 

char  :  :* 

1 |213|4  |5|6 |7|8|9|0|A|B|  ... 

umz 

in  the  database  dictionary  vill  be  referred  to  as  "vordds." 
Table  I  shoes  the  BHF  definitions  of  both  uvord  and  vordd. 

In  order  to  distinguish  betveen  a  "key"  on  the  keyboard 
and  a  "Key"  which  is  used  as  a  password  to  SEAL  and  UHSEAL 
data  objects,  the  latter  "Key"  will  always  begin  with  a 
capital  "K."  Finally,  because  aany  of  the  system  data 
structures  are  not  only  aaintained  as  FORTH  dictionaries 
(also  referred  to  as  vocabularies)  ;  but  vordds  are  stored  in 
a  data  structure  which  is  not  a  FORTH  dictionary  but  which 
aay  also  be  rightfully  called  a  dictionary,  the  following 
convention  will  be  followed.  Hhen  the  possibility  of  aabi- 
guity  aay  exist,  the  dictionary  being  referred  to  vill  be 
prefaced  by  its  naae  (e.g.,  root  dictionary,  DB  dictionary, 
etc.)  . 

B.  PHYSICAL  M2 HO R I  A HD  I/O  POETS 

i.  flatiaacs  aad  IZ2  Easss 

Physical  aeaory  is  that  aeaory  in  which  FORTH 
prograas  execute.  This  aeaory  lies  entirely  within  the 
user's  address  space.  The  PDBMS's  physical  aeaory  consists 


of  a  little  eore  than  32K  bytes  (see  rigure  4.1).  The  lover 
see  or  y  (OOOOB  to  3PFFH)  is  EPROH,  and  the  high  sesory  (4000H 
to  7FFFH)  is  RAH.  Additionally  there  are  2S6  bytes  of 
sesory  located  at  addresses  C300B  through  COFFH;  the  first 
40  bytes  of  these  256  bytes  represent  the  2  lines  of  20 
characters  on  the  liquid  crystal  display  (LCD).  The 
contents  of  these  sesory  locations  are  interpreted  as  ASCII 
encoded  data  and  are  sirrored  on  the  LCD.  Thus  the  LCD  is 
directly  addressable  via  the  systea's  bus.  Finally,  sesory 
locations  FF00B  to  FFFFB  coaprise  the  virtual  E2PR0H  vindov. 
Hhen  a  segsent  is  accessed  fros  E*PROH  by  writing  its 
segaent  nuaber  to  the  segsent  register  and  "powering  up”  the 
E*PROH,  it  appears  at  these  addresses  and  say  be  read  fros 
and  written  to.  When  E*PBOB  power  is  off  these  addresses 
are  invalid. 

There  are  two  ports  which  are  directly  associated 
with  the  user's  address  space  and  accessible  to  his.  One 
port  is  a  read-only  port  used  to  receive  data  fros  the 
keyboard  (it  is  envisioned  that  the  keyboard  will  eventually 
be  tied  directly  to  the  systea's  bus).  This  port  is  located 
at  FBB.  The  other  port  is  a  DART  port  configured  for  an 
RS232  serial  interface  and  is  located  at  PAH. 

Finally  three  locations  are  set  aside  as  juap 
vectors.  These  are  predet ersined  by  the  NSC800  hardware  in 
interrupt  node  1  which  siaics  the  Z80.  The  cold  boot  vector 
is  located  at  00B.  The  non-saskable  interrupt  (NHI)  juap 
vector  is  found  at  66 B.  This  interrupt  is  generated  by  two 
conditions:  whenever  the  systes  is  "turned  off"  by  the  user 
and  whenever  the  systes  is  reset  (via  the  reset  button). 
Because  of  the  slow  nature  of  the  E*PROH,  it  say  be  possible 
for  the  user  to  turn  the  power  off  or  reset  the  systes 
before  a  write-cycle  involving  a  large  block  of  data  has 
been  coapleted.  The  virtual  sesory  aanager  is  the  ultiaate 
recipient  of  WH Is.  Dpon  receiving  one,  it  waits  for  the 


write-cycle  to  be  coapleted  and  then  sets  bit's  1#  0,  and  4 
of  the  control  port  accordingly.  If ter  doing  that,  a  juap 
to  wars  boot  is  executed.  Setting  bit  4  to  one  when  the 
power  switch  is  in  the  on  position  has  no  effect,  so  the 
saae  interrupt  handling  routine  correctly  handles  both 
interrupt  sources.  Ten  seconds  after  an  HHI  generated  by  the 
power-off  condition,  the  hardware  autoaatically  shuts  itself 
off,  if  it  is  still  on  at  that  tiae.  The  third  location  is 
38H  which  contains  the  aaskable  interrupt  (HI)  vector.  Both 
the  keyboard  and  E*PROH  generate  interrupts  which  vector 
here;  the  device  requiring  service  is  deterained  by  reading 
the  status  register  (described  below). 

2 •  Data  Structures 

Pigure  4.1  shows  the  allocation  of  physical  aeaory 
to  data  structures  in  the  PDBHS.  It  varies  froa  the  config¬ 
uration  in  Figure  A. 1  only  in  that  it  has  data  buffers  and 
pointer  buffers.  These  buffers  share  aeaory  with  the  buffer 
blocks.  Block  and  data  buffers  are  not  used  concurrently  so 
they  do  not  occupy  the  buffer  area  at  the  sane  tiae3.  The 
data  buffers  are  used  for  encoding  and  decoding  individual 
database  records.  Records  are  read  into  the  buffers  as  they 
appear  in  E*PROH  (less  key  ID  nuabers  and  adainistrative 
pointers)  and  then  are  decoded  into  their  ASCII  representa¬ 
tion  which  is  placed  into  the  current  record  buffer  and  the 
LCD  window.  Probably  only  a  portion  of  the  record  fits  into 
the  40  character  LCD.  The  first  two  bytes  of  each  data 
buffer  contain  the  resident  record's  virtual  pointer  (FFPFH 
indicates  an  eapty  buffer). 


*Even  if  the  PDBHS  is  designed  so  that  it  LOADS  defini¬ 
tions  froa  screens  during  execution  of  database  operations, 
there  is  no  problen.  This  is  because  the  block  buffers  are 
not  used  during  a  LOAD;  the  E*PR0H  is  siaply  read  directly 
without  using  a  buffer. 
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The  pointer  buffers  serve  several  purposes.  During 
retrieval  operations  buffer  nusber  one  holds  the  pointers  to 
records  to  which  the  user  is  authorized  access  and  which 
have  satisfied  all  guery  conditions  processed  so  far.  The 
second  buffer  holds  pointer  s  to  records  to  which  the  user  is 
authorized  access  and  which  satisfy  the  current  guery  condi¬ 
tion  being  processed.  After  the  cospletion  of  the 

processing  of  each  query  condition  the  intersection  or  union 
of  the  two  buffers  (depending  upon  the  query)  of  the  two 
buffers  is  placed  into  buffer  one. 

C.  VIRTUAL  9 SHORT  AND  COST  SOL  POMS 

i.  laiiimi 

In  the  PDB9S,  E*PB09  is  used  as  secondary  storage. 
A  total  of  8K  bytes  of  E2PR09  is  included  and  it  is 
segeented  into  32  segaents,  each  256  bytes  in  size. 
Segaents  (analogous  to  POBTH  blocks)  are  further  divided 
into  physical  records  16  bytes  in  size.  Figure  4.2  shows 
the  bus  interface  of  the  Intel  2816  B*PROH  chips.  As  in 
standard  POBTH,  the  user  and  user  prograas  deal  with  phys¬ 
ical  addresses  only.  The  user  can  only  refer  to  virtual 
aeaory  by  using  screen  nuabers.  However,  soae  PDBHS  words 
use  two  byte  virtual  addresses  to  access  physical  records  in 
virtual  aeaory.  Only  asseably  language  coded  words 
("low-level"  words)  can  directly  fetch  and  store  bytes  in 
E*PH09  via  the  window. 

PD  BBS  virtual  addresses  consist  of  two  bytes.  One 
byte  contains  a  segaent  nuaber  and  the  other  a  physical 
record  nuaber  within  the  segaent.  Because  only  four  bits 
are  needed  to  designate  a  physical  record,  if  it  were  tech¬ 
nically  feasible  the  staa  could  accoaaodate  512K  bytes  of 
E*PRO  9. 
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Figure  4.2  2816  B*PROH  Caaf igaration. 
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Only  15  of  the  16  bits  are  used  for  virtual 
addresses.  The  high  bit  (bit  7  of  the  Most  Significant 
Byte— —MSB)  is  used  to  differentiate  virtual  from  physical 
addresses  in  E*PROM  and  HAM.  Virtual  addresses  which  move 
from  E* P RO M  to  BAM  and  vice  versa  must  pass  through  low 
level  FORTH  words  which  ensure  RAH  and  E*PROM  virtual 
addresses  never  get  mixed  in  with  each  other.  E2PR0M 
virtual  addresses  have  their  high  bit  set  to  zero  while  RAM 
virtual  addresses  have  their  high  bit  set  to  one.  Thus 
virtual  addresses  appear  to  be  out-of-range  references 
within  the  domain  in  which  they  occur.  For  example,  if  an 
address  referenced  inside  an  E2PR0M  segment  is  less  than 
800 OH,  then  it  is  a  virtual  address  to  another  segment. 
Intra-segment  addresses  are  always  greater  than  or  equal  to 
FF00H  (all  of  which  have  a  high  bit  of  one) .  This  means 
that,  as  in  standard  FORTH,  "programs'*  cannot  be  executed 
directly  from  secondary  storage  but  must  be  LOADed  first. 
This  allows  all  code  field  addresses  (CFA)  to  be  interpreted 
as  physical  addresses,  whet  her  they  occur  in  RAM,  EPROM,  or 
E*PROM,  so  there  is  no  problem  associated  with  storing 
constants  and  variables  in  E2PR0N.  Care  must  be  exercised 
to  ensure  that  LCD  window  addresses  are  never  used  in  the 
same  RAM  context  as  RAM  virtual  addresses  since  they  would 
be  indistinguishable  from  each  other. 

The  E* PROM  can  be  read  in  450  usee,  however  it 
requires  20  msec4  to  write  one  byte  (all  of  the  bytes  on 
each  chip  may  be  erased  in  one  10  msec  operation). 
Additionally  the  2816  must  be  strobed  with  a  21  volt  pulse 
during  the  write  process.  This  means  that  E2PR0M  cannot  be 


♦Intel  literature  .states  that  their  E*PROH  requires  10 
msec  per  write,  which  is  true.  However,  in  order  to  ensure 
that  the  data  is  properly  recorded,  the  addressed  byte 
should  contain  FFH  before  it  is  written  into  if  a  write 
requires  a  zeroed  bit  to.be  changed  to  one.  Thus  writing 
Involves  two  write  operations:  one  to  set  the  target  byte  to 
FFH,  and  a  second  to  write  the  desired  value. 


treated  the  same  as  BAH.  Other  non-volatile  memories  were 
considered  for  this  design,  such  as  NOVBAH  and  Instant  BOH. 
Both  of  these  alternatives  can  be  treated  almost  as  if  they 
were  BAH,  however  they  were  judged  unsuitable.  NOVBAH  was 
not  found  to  be  a  feasible  choice  because  of  its  small  size. 
The  largest  NOV  BAH  chip  contains  only  256  bytes,  thus  8K  of 
NOVBAH  cannot  be  battery  powered  because  of  the  large  number 
of  chips  that  would  be  required.  Instant  BOH  was  also  found 
to  be  undesirable  because  it  contains  its  own  battery  power. 
The  on-chip  battery  is  guaranteed  for  three  years,  and  this 
is  hardly  suitable  for  a  permanent  database.  Currently 
available  hand-held  computers  use  concepts  similar  to 
Instant  BOH,  they  use  CHOS  memories  which  are  constantly 
refreshed,  even  when  they  are  turned  "off." 

The  E2PB0H  and  the  PDBHS  is  controlled  through  three 
control  ports.  One  port,  the  segment  register,  is  used  to 
select  the  desired  segment.  This  port  is  located  at  F8H  and 
is  write-only.  The  second  port  is  the  status  register.  It 
is  located  at  F9H  and  it  is  read-only;  it  reflects  the 
system*s  current  status.  Figure  4.3  shows  the  status  port*s 
configuration.  Complementing  the  status  register  is  the 
control  register  which  is  a  write-only  port  located  at  F9H. 
The  control  register  is  used  to  effect  system  changes.  This 
port  is  described  in  Figure  4.4.  These  ports,  as  well  as 
all  other  ports,  are  "smart"  ports  in  that  they  only  accept 
instructions  from  code  being  executed  from  EPBOH.  It  does 
this  by  checking  the  program  counter  which  the  NCS800  places 
on  the  address  bus  prior  to  fetching  an  opcode  fetch.  If 
the  A 15  and/or  A 14  lines  of  the  address  bus  are  high  the 
next  instruction  is  ignored.  E*PBQH  power  and  write-power 
are  turned  on  and  off  by  setting  bits  0  and  1  accordingly. 
Whenever  either  of  these  bits  is  set  to  one,  bit  7  of  the 
status  register  is  set  to  zero.  After  the  chips  have  been 
powered-up,  bit  7  of  the  status  register  is  set  to  one,  so 
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Bits 


Flag  Meanings 


Boot-up  Values 


It  CCMON  Mt  reedy 


It  [(NM  »l 


It  CCMON  vrtte-pooer  (•  iff 


It  CCMON  Interrupt  pending 
It  No  CCMON  Interrupt  pending 


It  Keyboard  Interrupt  pending 
It  Ne  keydoerd  Internet  pending 


It  UMT  receiver  reedy 
It  UMT  receiver  net  reedy 


It  UMT  treneeltter  reedy 
It  UMT  treneeltter  not  reedy 


Figaro  4.3  Status  Port  Flags  (IN  9FH) 
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is  bit  6  or  5  (depending  upon  whether  bit  0  or  1  of  the 
control  register  had  been  set).  Addit ionally,  whenever  bit 
7  is  set  to  one  (except  during  a  cold  boot  of  the  system) , 
an  HI  is  generated.  When  bit  7  of  the  control  register  is 
set  to  one,  bit  7  of  the  status  register  goes  to  zero,  fihen 
the  E2PB0H  write-cycle  has  been  completed,  bit  7  goes  high 
and  an  HI  is  generated. 

Changes  in  bits  0  and  1  of  the  status  register  do 
not  generate  interrupts,  but  when  bit  2  goes  high  (indi¬ 
cating  keyboard  input)  an  HI  is  generated.  Beading  the 
status  register  resets  bit  2  to  zero. 

Notice  froe  Figure  4.2  that  the  four  2816  chips  are 
interleaved  so  that  all  addresses  egual  to  zero,  nod  four, 
are  on  the  first  chip  (i.  e.,  those  addresses  whose  last 
hexadecioal  digits  are  0,  4,  8,  or  C)  .  Those  egual  to  one, 
aod  four,  are  on  the  second  chip,  etc.  This  arrangement 
facilitates  fast  writing  of  blocks  of  data  to  E*PBOH  because 
four  contiguous  bytes  nay  be  written  simultaneously.  Thus 
in  the  best  case  (when  four  contiguous  bytes  are  written) 
the  average  write-time  pec  byte  is  approximately  5  msec  and 
an  entire  segment  can  be  written  in  1.25  seconds.  Actually 
more  time  is  reguired,  but  the  additional  time  is  minor  when 
compared  to  the  gross  nature  of  the  E*PBOH  write-time.  The 
additional  time  involves  reading  and  comparing  the  contents 
of  the  E2PBOH  to  the  appropriate  buffer's  contents  (data  or 
block  buffer)  .  The  entire  write-cycle  algorithm  is  shown  in 
Table  II. 

2.  flcaaaiaaiiaa  i M  fiiia  aisagiatsa 

The  8K  bytes  of  E^PBOH  are  divided  into  two  types  of 
segments:  system  segments  and  block  (or  screen)  segments. 
System  segments  are  owned  by  the  system  and  cannot  be 
directly  accessed  by  the  user  or  his  programs.  Block 
segments  are  those  which  contain  screens,  in  the  usual  FOBTH 
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Bit  Set  Heanfngs 

It  Stort  ECPMN  wlto-cycl# 
#«  No  offoct 


NOt  UNO 


Not  UOM 

It  Turn  oyotoa  off  (ECPMN  ouot  M  off  flrot) 
•t  NO  OffOCt 


It  Turn  (ENON  orlto-voltogo  on 
•*  Turn  CEPMN  ■rtto-cycl#  voitogo  off 

It  Turn  ECPMN  moot  oup»ty  on 
•i  Ttm  ECPMN  pouor  oupoiy  off 


Control  Port  Flags  (OOT  9FH) 


TABLE  II 

Virtual  Heaory  Write-cycle  Algoritha 


J  «  ST ABT_OP_S EG  HEN  T ; 

BEPEAT  UNTIL  NO.MOBE.B  ITES ; 

DO  I  »  J  TO  J+3 ; 

BEAD  E*PROM_BYTE(I) ; 

IP  BOPPEB.BTTE(I)  *  E*  PBOH.BITE  (I)  THEN  DO; 

IP  BUPFER_BYTE(I)  S  E*PROH_BYTE  (I)  #  0  THEN 
B*PBOH_BYTE(I)  *  PPH; 

E*PROH_BY TE (I)  »  BUPFER.B  ITS  (I)  ; 

END  DO; 

END  DO; 

CONTROL  ^PORT^BITS  (7)  *  1; 

LON  PONER  HALT;  /•  WAIT  FOR  INTERRUPT  */ 

DO  I  *  J  TO  J+3 ; 

BEAD  E*PBOH.BYTE(I)  ; 

IP  BUPFER^BYTE  (I)  #  E* FROSTBITE  (I)  THEN 

SIGNAL (E* PBOH_» 8ITE_BRROR)  ; 

END  DO; 

J  »  J  ♦  4; 

END  BEPEAT; 


sense,  and  are  available  to  the  user.  Blocks  are  allocated 
sequentially  in  a  round-robin  fashion  by  the  aeaory  aanager. 
This  leans  that  the  next  segaent  to  be  allocated  is  the  next 
higher  unallocated  segaent  after  the  last  allocated  segaent. 
Hhen  the  32nd  segaent  is  reached,  allocation  begins  again 
froa  the  first  segaent  not  initially  assigned  to  the  systea 
(i.e. ,  when  the  software  was  placed  into  the  systea).  This 
scheae  is  used  in  an  atteapt  to  acre  uniforaly  distribute 


44 


the  E*PROH  use.  If  a  "lowest  available  segaent  algorithm" 
were  used,  there  would  be  a  higher  probability  that  portion 
of  E*PBOH  assigned  to  the  low  nuabered  segaents  aight  "burn 
out"  (E2PB0H  is  limited  to  10,000  write  operations  to  each 
individual  byte). 

a.  System  Segaents 

System  segaents  are  those  which  are  used  by  the 
PDBHS  for  virtual  aeaory  management  data  structures  and  the 
database.  The  user  cannot  directly  access  these  segaents 
because  any  segaent  allocated  to  the  systea  is  not  placed  in 
the  block  nuaber  dictionary.  System  routines  address  these 
segaents  directly  (i.e. ,  they  "know"  the  physical  segaent 
numbers  whereas  the  user  knows  only  virtual  block  or  screen 
nuabers) .  At  least  four  segments  are  dedicated  to  the 
systea;  the  systea  and  the  user  compete  for  the  remaining 
segaents  (less  systea  message  screens)  which  are  allocated 
on  a  first-coae,  first-serve  basis.  Additional  systea 
segaents  (beyond  the  dedicated  four)  are  used  to  accoaaodate 
the  expanding  database.  Because  the  database  resides  in 
systea  segaents,  the  user  cannot  see  their  physical  struc¬ 
ture;  he  is  limited  to  viewing  it  through  the  PDBHS.  The 
first  four  segaents  are  structured  as  described  below. 

( 1)  .  Par aaeter  Table.  This  segaent  contains  a 
collection  of  systea  paraaeters  and  tables.  For  exaaple, 
aost  of  the  cold  boot  paraaeters  are  loaded  froa  here.  Also 
located  here  is  the  vocabulary  table. 

(2)  .  Jay  SflfrzQlct4.onarv.  Security  in  the  PDBHS 
is  provided  in  part  by  Keys.  These  Keys  are  used  to  seal 
records,  blocks,  and  other  Keys.  These  Keys  are  maintained 
in  a  linked  list  dictionary  as  a  separate  TOCABOLABT.  The 
Key  vocabulary  definition  is  located  in  EPBOH.  The  code 
pointer  of  each  Key  points  to  the  run-tiae  code  for  COHSTAHT 
which  is  located  at  docon.  Thus  when  the  Key  is  executed. 


45 


it  returns  the  contents  of  its  two  byte  parameter  field 
address  (PPA)  .  The  value  held  in  the  PFA  say  have  two  mean¬ 
ings.  If  the  value  returned  is  less  than  128,  then  it  is 
the  Key's  identification  number  (10).  If  it  is  greater  than 
128,  then  the  value  returned  is  a  virtual  pointer  to  a 
sealed  record  containing  the  Key's  10  number.  The  Key  10 
value,  FFH  is  reserved  for  the  null  Key,  while  the  value  00H 
is  reserved  for  the  system's  Key.  Also  the  value  FEH  is 
used  as  a  substitute  ID  for  the  ID  value  of  deleted  Keys' 
IDs  in  access  descriptors.  The  use  of  Keys  is  discussed  in 
greater  detail  in  Chapter  71.  The  Key  vocabulary,  besides 
containing  Keys,  contains  words;  these  words  are  stored  in 
EPROH. 

(3)  .  Block  Hu wber  Dictionary.  The  segment 
containing  this  is  divided  into  three  parts.  Pour  bytes  are 
set  aside  as  the  segment  allocation  table,  four  bytes  are 
used  as  the  segment  allocation  sequencer  table,  and  the  rest 
of  the  segment  is  used  as  a  vocabulary  for  virtual  block 
numbers.  Each  bit  in  the  segment  allocation  table  repre¬ 
sents  a  segment.  If  a  bit  is  set  to  one,  the  corresponding 
segment  has  been  allocated.  The  sequencer  table  has  only 
one  bit  set,  the  one  corresponding  to  the  last  segment  allo¬ 
cated. 

The  virtual  block  numbers  are  maintained 
as  a  FORTH  vocabulary,  as  are  the  Keys.  Also  like  the  Key 
vocabulary,  the  definition  of  the  block  number  vocabulary  is 
located  in  EPROH.  However,  unlike  the  Keys,  virtual  block 
numbers  are  fixed  length  name,  one  byte  constants.  This 
allows  virtual  numbers  to  be  assigned  to  all  of  the  origi¬ 
nally  unallocated  segments.  This  limits  blobk  numbers  to 
four  characters  in  length.  This  dictionary  is  static  and 
always  contains  28  entries.  Entries  are  removed  from  the 
dictionary  by  blanking  out  their  virtual  number  (i.e.,  the 
entry's  name  field)  and  setting  the  smudge  bit  so  they  will 
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not  be  found.  Hhen  a  virtual  block  number  is  entered  by  the 
user,  the  entire  dictionary  is  searched.  For  example  the 
following  keyboard  entries  would  trigger  searches  of  the 
dictionary  for  "1"  and  "25"  respectively. 

1  LIST 

25  LOAD 

If  ”1"  had  not  been  found  in  the  dictionary  a  block  buffer 
(located  in  physical  memory)  would  have  been  allocated  to 
virtual  block  "1."  The  virtual  number  "1"  would  not  be 
entered  into  the  block  number  dictionary  until  it  was 
written  to  E2PR0H.  If  "2  5"  had  not  been  found  the  usual 
FORTH  error  condition  would  have  been  raised. 

(4)  .  Database  Segment.  This  block  is 

broken  into  two  parts.  The  first  contains  a  jump  table  into 
the  DB  dictionary.  There  is  one  jump  vector  for  each  prin¬ 
table  ASCII  character  allowed  by  the  system  (a  maximum  of 
64) .  A  character's  jump  vector  is  hashed  to  using  the 
following  equation  on  the  character's  hexidecimal  value 
(called  "char")  . 

Location  of  jump  vector  ■ 

((char  -  32H)  *  2)  ♦  FFOOH 

If  the  vector  is  equal  to  zero,  then  the  character  is  punc¬ 
tuation  (as  described  in  Table  I)  .  Punctuation  is  not 
stored  in  the  DB  dictionary.  If  the  vector  is  equal  to 
FFFFH  (uninitialized  E*PROH),  then  there  are  currently  no 
wordds  in  the  dictionary  starting  with  that  letter, 
otherwise  the  vector  is  the  virtual  address  of  the  first 
physical  record  in  an  alphabetical  linked  list  of  wordds 
beginning  with  that  letter.  The  next  four  bytes  of  this 
segment  contain  a  bit  map  of  the  segments.  Like  the  segment 
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allocation  table,  a  bit  is  set  to  one  if  the  corresponding 
segment  belongs  to  the  database. 

The  second  half  of  this  database  segment 
is  used  for  the  beginning  of  the  file  and  field  name  vocabu¬ 
lary.  Field  entries  are  simply  FORTH  constants  which  return 
their  field  ID  number  (0  to  255)  .  File  entries  are  modified 
FORTH  vocabulary  definitions  (they  contain  five  extra  bytes 
used  to  store  pointers  to  the  first  and  last  records  in  the 
file,  and  a  field  count) .  The  field  names  are  entries  into 
the  "file  vocabulary"  to  which  they  belong.  This  allows 
F0RGB9  to  be  used  to  delete  files.  Of  course  FORGET  is  not 
sufficient  by  itself;  the  virtual  aernory  allocated  to  the 
forgotten  entries  must  be  turned  back  to  the  system. 
Because  of  the  nature  of  record  entries  in  the  PDBHS,  fields 
cannot  be  individually  forgotten.  k s  with  the  Key  vocabu¬ 
lary,  the  file  vocabulary  definition,  as  well  as  some  other 
words,  reside  in  EPROM. 

Hhen  information  is  added  to  the  database. 


it  expands  in  three  ways.  First  the  file  and  field  vocabu¬ 
lary  grows  to  accommodate  new  file  and  field  definitions. 
This  dictionary  may  spill  into  additional  segments. 
Allowing  this  dictionary  to  exist  in  more  than  one  segment 
creates  some  problems  which  must  be  specifically  addressed 
by  the  interpreter/compiler.  Off-segment  references  can 
only  address  16-bit  physical  records,  so  entries  of  this 
type  cannot  be  positioned  in  a  "format-free"  manner.  Thus 
entries  in  this  vocabulary  are  all  placed  in  memory  taking 
the  physical  record  into  consideration  (i. e. ,  beginning  on  a 
physical  record  boundary).  k  benefit  of  this  is  that  the 
entries  may  be  mixed  into  the  same  segments  with  the  D3 
entries,  file  logical  records,  and  sealed  Keys. 

The  database  itself  may  be  considered  a 
totally  inverted  file  system.  Records  contain  only  PDBHS 
information  and  pointers  to  dictionary  entries  of  wordds 


which  appear  in  the  record.  Figure  4.5  shows  a  typical 
entry  in  the  PDBMS.  The  system  knows  how  many  fields  are  in 
the  currently  open  file,  so  it  uses  the  last  field's 
end-of-field  (EOF)  as  the  end  of  record  marker  (EOS)  .  The 
EOF  is  the  same  character  as  the  null  Key,  making  FFH  (blank 
E*P BOH)  a  general  system  end-of-data  marker,  when  a  logical 
record  is  broken  over  a  physical  record  boundary,  the  last 
two  bytes  of  the  physical  record  contain  a  pointer  to  the 
next  physical  record. 

Fields  are  strings  of  ASCII  characters 

followed  by  an  entry  ID  number.  The  ASCII  letters  are  the 
initial  letter  of  the  wordds  (i.e.  ,  transformed  uwords) 
originally  entered  into  the  record  by  the  user.  The  letters 
are  used  to  hash  to  the  jump  vector  table  on  the  first 
segment  of  the  database.  DB  dictionary  entries  are  main** 
tained  in  an  alphabetical  linked  list.  The  correct  wordd 
corresponding  to  the  uword  entered  into  the  record  is  found 
by  matching  the  ID  number  following  the  letter  used  as  input 

to  the  hash  function  to  the  ID  number  of  a  wordd  on  the 

linked  list  hashed  to.  Punctuation  is  not  followed  by  an  ID 
number  and  the  record  decoding  routines  "know"  not  to  look 
for  an  ID  number  in  the  record  because  punctuation  jump 
vectors  are  egual  to  zero. 

Figure  4 .6  shows  a  typical  dictionary 

entry.  This  structure  is  an  expanded  and  modified  version 
of  the  one  used  in  Craig  language  translators  [5].  The 
entries  are  designed  to  take  advantage  of  the  alphabetical 
nature  of  English  language  dictionaries.  The  first  byte 
contains  a  zero  and  is  ignored  when  traversing  the  DB 
dictionary  during  a  wordd  look-up.  It  is  placed  there  to 
prevent  an  accidental  retrieval  by  non-dictionary  routines 
which  always  treat  the  first  byte  as  a  Key.  The  second 
byte,  the  copy  byte,  contains  the  number  of  leading  charac¬ 
ters  in  the  current  wordd  which  match  the  leading  characters 


Key  ID 
Key  10 


Figure  4.5  Database  Physical  Record  structure 
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in  the  previous  wordd  on  the  linked  list.  The  link  bytes 
contain  a  pointer  to  the  next  wordd  in  the  linked  list.  The 
add  byte  contains  a  number,  which  when  added  to  the 
“copy  byte  ♦  1"  character  of  the  previous  wordd  yields  the 
correct  "copy  byte  ♦  1“  character  of  the  current  wordd.  The 
bytes  following  the  add  byte  contain  the  ASCII  characters  of 
the  current  wordd  after  the  "copy  byte  ♦  1"  character.  The 
last  character's  high  bit  is  set  to  one  as  an  end  of  string 
delimiter.  If  there  are  no  characters  following  the 
"copy  byte  ♦  1"  character  then  the  byte  following  the  add 
byte  contains  FFH  (which  translates  to  an  ASCII  delete). 
The  wordd  ID  byte  contains  the  wordd 's  ID  number.  This  is 
used  when  decoding  records.  Figure  4.6  shows  how  the  DB 
entries  for  "FORGET"  and  "FORTH"  would  appear  if  they  were 
consecutive  entries  and  "FORGET”  was  the  first  "F  wordd." 
Following  the  last  unique  character  is  a  linked  list  of 
field  ID  numbers  with  pointers  to  records  containing  the 
field  associated  with  its  corresponding  field  ID.  These 
field  numbers  and  pointers  are  used  in  retrieval  operations. 
Records  are  retrieved  by  specifying  field  names  and  uwords. 
Obviously  punctuation  cannot  be  used  for  retrieval  since 
only  wordds  are  stored  in  the  DB  dictionary. 

Figure  4.7  shows  how  the  dictionary  is 
traversed  to  find  the  desired  wordd.  Dwords  are  reassembled 
in  the  PAD  by  making  the  changes  indicated  by  the  copy  byte, 
add  byte,  and  unique  characters  as  the  list  is  traversed. 
That  is,  when  the  DB  dictionary  linked  list  is  entered,  the 
first  wordd  in  the  list  is  copied  out  into  the  PAD.  If  this 
is  the  not  target  wordd,  then  the  second  *.ntry  in  the  linked 
list  is  moved  to.  Using  the  information  in  the  copy  byte, 
the  add  byte,  and  the  unique  characters,  the  second  wordd  in 
the  list  is  constructed.  In  moving  from  "FORGET"  to  "FORTH" 
as  shown  in  Figure  4.6,  "FORGET"  would  be  written  into  the 
PAD  as  the  first  wordd  in  the  linked  list  of  "F  wordds." 
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Figaro  4.6  Structure  of  a  OB  Dictionary  Entry. 


52 


When  the  search  continued  past  "POSSET"  because  it  was  not 
the  target  wordd,  the  first  three  letters  in  the  PAD  would 
be  left  because  the  copy  byte  of  the  second  entry  is  3. 
Then  13  would  be  added  to  the  fourth  letter  (G)  because  that 
is  the  contents  of  the  add  byte.  This  would  change  the 
fourth  letter  fro*  a  "G"  to  a  "T."  Then  the  fifth  letter, 
and  any  subsequent  ones,  would  be  replaced  by  the  the  unique 
characters  (in  this  case  " T"  would  be  overwritten  with  an 
"H").  At  this  point  the  PAD  contains  the  wordd  "FORTH." 

Once  a  wordd  has  been  placed  into  the 
dictionary,  its  first  physical  record  is  never  returned  to 
the  systea  to  be  reallocated.  If  all  instances  of  a  wordd 
are  reaoved  froa  the  database,  the  high  bit  of  the  copy  byte 
is  set  to  one.  Subsequent  searches  of  the  dictionary  will 
not  "see"  a  wordd  if  its  copy  byte  contains  a  negative 
nuaber  (two's  coapleaent) .  Because  the  dictionary  is  a 
linked  list,  this  aeaory  aay  be  reused  in  the  saae  list  by 
reattaching  it  at  a  different  point  in  the  list.  When  the 
first  record  is  reused,  the  new  wordd  placed  in  it  uses  the 
ID  nuaber  assigned  to  the  first  wordd  to  use  the  record. 
This  is  done  to  sake  ID  assignaent  easier  and  to  stave  off 
the  possibility  of  running  out  of  ID  nuabers*.  Physical 
records  other  than  the  first  aay  be  returned  to  the  systea 
when  a  wordd  is  deleted. 

In  segaents  acquired  by  the  sysxea  to 
accoaaodate  database  expansion,  only  15  physical  records  are 
used  for  the  database.  The  first  record  (record  0)  contains 
adainistrative  inforaation  such  as  a  record  allocation  aap 
for  the  segaent. 


*  The  aaxiaua  ID 
Appendix  B  indicate 
address  books,  the 
that  large. 


nuabar  is  2$5.  The  statistics,  in 
that,  even  in  an  aggregate  of  four 
aaxiaua  nuaber  of  unique  wordds  is  not 
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Figure  4.7  OB  Dictionary  lordd  Look-up. 
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fc.  Screen  Segments 

These  segaents  belong  to  the  user  for  use  as 
FORTH  screens.  A  screen  segnent  is  divided  into  two  parts. 
The  first  physical  record  contains  the  screen's  access 
descriptor.  The  rest  of  the  records  contain  the  part  of  the 
segaent  the  user  sees  as  a  screen.  A  screen  consists  of  16 
rows  of  15  characters.  This  is  auch  smaller  than  the 
standard  PORTH  screen  which  is  16  rows  of  64  characters. 
The  saaller  screen  is  better  suited  to  the  2  row  by  20 
character  LCD. 

When  the  system  is  first  initialized  (i.e.,  when 
the  software  is  first  placed  on  the  hardware),  soae  of  the 
screen  segaents  are  used  to  store  systea  messages,  as  in 
standard  FORTH.  Additionally,  soae  screens  are  used  to 
store  some  of  the  definitions  used  in  the  PDBHS ,  particu¬ 


larly  those  used  with  the  naive  user  interfaces.  This 
allows  the  user  to  eliainate  or  change  these  definitions  and 
system  messages  as  he  sees  fit. 


V.  X££  DSIICg  2£S£5I£XI2fi 


At  the  tine  of  this  writing,  the  PDBHS  is  in  the  process 
of  being  prototyped.  This  first  prototype  is  not  intended 
to  seet  all  of  the  desired  characteristics  of  a  PDBHS.  For 
exaaple,  it  cannot  be  hand -held  because  it  is  bread-boarded 
and  a  standard  keyboard  is  used;  additionally  it  requires 
aore  than  one  power  supply  because  not  all  of  the  CHOS 
coaponents  have  been  received.  Shat  is  described  in  this 
chapter  is  the  outline  of  the  final  prototype  as  it  is  envi¬ 
sioned  at  the  present  tiae.  For  the  aost  part,  this  is  a 
description  of  the  PDBHS  as  it  would  appear  to  the  user. 

A.  THE  HARDWARE 

Froa  the  user's  point  of  view,  the  hardware  consists  of 
four  aajor  coaponents:  1)  the  enclosure,  2)  the  display,  3) 
the  keyboard,  and  4)  the  electronics  inside.  These  aspects 
involve  how  the  systea  physically  appears  to  the  user,  not 
how  he  perceives  it  to  work. 

i.  iks  fiagJLaaais 

The  enclosure  should  be  as  snail  as  possible  and  yet 
still  be  useful.  The  aajor  constraints  upon  how  snail  the 
PDBHS  can  be  aade  are  the  size  of  the  display  and  the 
keyboard.  The  ainiaun  practical  size  available  with 
currently  available  products  is  approxiaately  9  inches  (23 
ca)  by  4  inches  (10  ca)  by  1  inch  (2.5  ca)  .  This  is  the 
average  size  of  aost  of  the  hand-held  coaputers  today,  such 
as  those  aade  by  Panasonic,  Radio  Shack,  and  1X0  [6  and  7]. 
These  systeas  tend  to  weigh  around  14  ounces  (400  ga) . 
Their  size  seeas  to  be  the  saallest  practical  one  in  order 
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to  keep  the  keys  far  enough  apart  to  ainiaize  the  chances  of 
hitting  the  wrong  key  or  hitting  two  keys  at  once*.  It  is 
doubtful  that  the  display  will  be  shrunk;  if  anything, 
future  displays  will  be  larger  and  allow  saaller  fonts,  thus 
allowing  sore  inforaation  to  be  shown.  Oltiaately,  it  could 
be  possible  for  the  display  to  doainate  the  front  of  the 
PDBMS  if  voice  input  were  incorporated.  This  would  aost 
certainly  require  a  large  display  because  function  keys 
would  probably  not  be  used  (or  even  desired)  and  the  systea 
would  be  expected  to  echo  all  vocal  input  so  that  the  user 
could  verify  that  he  had  been  correctly  understood. 

The  back  of  the  enclosure  opens  to  allow  batteries 
to  be  changed  and  E*?BOH  to  be  added  in  or  taken  out.  This 
last  feature  would  not  only  allow  the  user  to  expand  his 
aeaory  (or  treat  it  like  a  floppy  disk,  i.e.,  interchange¬ 
able  secondary  storage),  but  also  allow  the  transportation 
of  software  and  data  froa  one  PDBHS  to  another  by  a  means 
other  than  through  the  BS232  port.  The  hardware  and  soft¬ 
ware  of  the  first  prototype  do  not  include  an  ability  to  add 
aore  E*PBOH,  but  the  required  aodifica tions  are  ainor. 

It  should  be  aentioned  that  the  current  iapleaenta- 
tion  of  Keys  does  not  gracefully  support  the  transportation 
of  sealed  objects  froa  one  systea  to  another  by  physical 
transportation.  There  is  no  way  to  guarantee  that  security 
would  be  uniforaly  enforced,  independent  of  the  systea  in 
which  the  objects  are  found,  because  key  assignaents  are 
local  in  context. 


•  The  size  of  the  ke 


the  user  J|eels  coaforta 


ys  is  peally  uniaportant  so  long 
ole  using  then.  This  noraally 
ays  should  oot  be  physically  uncc 


so  long  as 

_  _  _  _ „  _  _  noraally  is 

taken  to  aean  that  the  keys  should  oot  be  physically  uncoa- 
fortable  to  use  and  they  should  provide  soae  sort  of  tactile 
and  audible  response  upon  being  struck. 


2.  fiisslai 


The  current  display  is  an  LCD  which  contains  two 
rows  of  20  characters  each.  This  is  larger  than  the 
displays  in  most  of  the  currently  available  hand-held 
coaputers.  These  normally  have  one  row  of  16  to  20  charac¬ 
ters.  It  was  felt  that  two  lines  were  the  ainiaua 
acceptable  nuaber  of  lines  for  the  PDBHS.  Two  lines  allow 
user  coaaands  and  responses  to  appear  on  one  line  and  the 
system  responses  and  prompts  to  appear  on  the  other.  This 
allows  the  user  to  compare  his  commands  and  responses  with 
the  systea*s.  Ideally  the  PDBHS  should  have  a  larger 
display.  The  largest  LCD  displays  available  at  this  time 
have  four  lines  with  40  characters  per  line,  however  these 
are  too  expensive  to  be  compatible  with  cost  criteria  of  the 
PDBHS7. 

3 .  Ifca  Keyboard 

Host  of  the  keys  should  be  3/16  inch  (0.5  cm)  square 
and  protrude  from  the  keyboard  background  by  1/8  inch  (0.3 
cm).  The  keys  are  separated  by  1/4  inch  (0.6  cm).  These 
dimensions  are  used  on  most  of  the  Hewlett-Packard  calcula¬ 
tors  for  the  arithmetic  keys  (i.e. ,  ♦  -  ♦>  x) .  Using  them 
as  an  example,  the  author  found  that  keys  were  easily 
differentiated  from  one  another,  and  two  or  more  keys  were 
almost  never  pushed  simultaneously.  The  keys  should  be 
arranged  by  function  with  the  background  colored  differently 
for  the  letters,  numbers,  and  special  function  keys,  similar 
to  what  was  done  on  the  Quasar  and  Panasonic  computers  (6]. 
The  on/off  switch  should  be  away  from  the  other  keys  and  be 
a  sliding  switch,  not  a  push  switch.  This  should  be  done  to 
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help  prevent  the  accidental  switching  on  or  off  of  the 
pow  er . 

The  letter  keys  should  be  arranged  in  the  standard 
"QSEHTY"  foraat,  not  only  because  of  the  entrenched  place  in 
the  English  speaking  world  [  1  ],  but  also  because  it  has  been 
found  to  be  aore  effective  than  previously  thought  relative 
to  soae  keyboards  designed  using  huaan  engineering  princi¬ 
ples,  especially  with  novice  users  [3].  At  the  present  only 
upper-case  letters  are  planned  to  be  provided  to  the  user 
for  text  entry.  Below  is  a  list  of  the  keys  and  their 
functions. 

a.  Letter  and  Digit  Keys 

These  keys  act  in  the  usual  and  expected 
fashion;  they  are  used  to  enter  the  ASCII  representation  of 
che  desired  character.  Input  froa  these  keys  is  handled  as 
it  nora;  tly  would  be  in  any  FORTH  systea.  The  letter  keys 
nay  also  be  used  as  "function  keys."  When  shifted,  using 
the  shift  key,  the  ASCII  code  for  the  key's  lower-case 
equivalent  is  generated.  These  "illegal"  characters  are 
treated  siailarly  to  La?  OR  I B  words;  that  is,  they  are  inter¬ 
preted  iaaediately  upon  input  *9].  Initially  the  function 
accoaplished  by  these  words  is  to  place  into  the  input 
aessage  buffer  and  the  LCD  window  the  ASCII  string  represen¬ 
tation  of  other  words;  they  do  not  appear  in  the  input 
aessage  buffer  or  on  the  LCD".  For  exaaple,  in  the  database 
aanageaent  application  a  shift-G  causes  the  word  GET  to  be 
placed  in  the  aessage  buffer  and  the  LCD  window  so  when  the 
return  key  is  eventually  pushed,  BOSD  will  find  GET  in  the 
buffer,  not  shift-G.  notice  that  the  keys  aay  perfora 
different  functions  depending  upon  the  current  vocabulary. 
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b.  Hatheaatical  Keys 

These  keys  are  siailar  to  the  shifted  lettered 
keys,  however  they  act  as  input  inaediate  words  without 
shifting  then.  That  is,  they  always  cause  a  search  of  the 
current  vocabulary.  This  was  done  so  that  the  user  can 
choose  to  use  either  infix  or  postfix  notation  (infix  nota¬ 
tion  is  the  default  definition  of  these  keys  in  the  "naive" 
calculator  vocabulary).  These  keys  include  the  following 
five  keys: 

♦  -  x  ♦  % 

c.  Special  Function  Keys 

These  keys  are  the  usual  terainal  editing  keys, 
and  with  the  exception  of  the  "NEXT"  keys,  they  are  not 
prograaaable.  The  keys  are  described  below. 

(1) .  Enter.  This  key  causes  a  carriage  return 
and  line-feed  to  be  placed  into  the  input  which  is  reflected 
upon  the  LCD.  This  causes  the  interpreter  to  begin  parsing 
the  input. 

(2)  .  fisl.  This  causes  a  control-H  to  be  input 
and  acts  as  a  character  deletion  key.  It  backs  up  the 
cursor  one  position  and  displays  a  space  on  the  LCD. 

(3)  .  This  aoves  the  cursor  to  the  right  one 
character  position  without  effecting  the  contents  of  the  LCD 
window  cr  the  aessage  buffer. 

(4)  .  £.  This  aoves  the  cursor  to  the  left  one 
character  position  without  effecting  the  contents  of  the  LCD 
window  or  the  aessage  buffer. 

(5)  .  Shift.  This  is  a  non-locking  shift  key 
used  with  other  keys  to  elicit  their  alternate  definitions. 

(6)  •  X>.  This  deletes  all  input  froa,  and 
including,  the  current  cursor  position  to  the  end  of  the 
line. 
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(7)  •  and  ILSXIi.  These  keys  are  used  to 

scroll  the  display  to  the  next  line  above  or  below,  respec¬ 
tively.  In  the  database  application,  the  shifted  NEXT  keys 
are  used  to  scroll  to  the  next  field  above  and  below  the 
current  field.  This  allows  fields  to  include  carriage 
returns  and  line-feeds  so  that  a  field  need  not  be 
constrained  to  one  logical  line  on  the  display. 

B.  THE  SOFTS ABE 

When  the  user  initially  receives  the  systea,  he  is 
presented  only  with  two  functions:  a  calculator  and  a  data¬ 
base  aanager.  He  does  not  have  direct  access  to  BOOT.  This 
was  done  to  help  prevent  the  user  from  inadvertently 
destroying  the  systea  before  he  understands  it.  For 
exaaple,  it  prevents  hia  froa  redefining  or  forgetting  a 
word  accidentally.  The  user  can  expand  the  scope  of  the 
systea  gradually  as  he  learns  aore  about  it  until  he  can,  if 
he  chooses,  run  it  strictly  in  FOBTH  (or  even  redesign  the 
systea  to  a  great  extent)  •  This  flexibility  is  gained  by 
using  FORTH  execution  vectors.  In  the  case  of  interfacing 
with  different  levels  of  users,  there  is  a  different  version 
of  FIND  for  each  level  of  user  sophistication.  So  as  the 
user  becoaes  aore  adept  with  the  systea,  the  vector  associ¬ 
ated  with  FI 10  is  siaply  aade  to  point  to  a  new,  aore 
powerful  version  of  FIHD'  s  run-tiae  code.  The  version 
initially  available  to  the  user  only  searches  the  liaited 
calculator  and  database  aanageaent  vocabularies;  the  BOOT 
vocabulary  is  not  searched.  The  version  available  to  the 
aost  sophisticated  user  includes  a  lodified  version  of  the 
standard  FOBTH  FIID.  All  FIHD s  have  been  Modified  to  be  a 
little  aore  user  friendly.  Instead  of  reporting  the  usual, 
"IS  OHDEFIHED,"  when  a  word  is  not  fnund,  the  PDBHS  reports 
the  current  vocabulary's  naae  as  well.  So  for  exaaple  if 


the  user  entered  a  (:}  when  he  was  using  the  database  vocab¬ 
ulary  where  it  is  undefined,  the  system  would  report,  "NOT 
DATABASE  BOND."  Notice  that  this  nessage  nay  fall  off  the 
right-hand  side  of  the  display  for  some  words;  but  the  first 
word  of  the  nessage  should  cue  the  user  to  the  error  and  if 
he  then  realizes  that  he  has  forgotten  what  the  current 
vocabulary  is  he  can  aove  the  display  to  the  right  using  the 
cursor  control  keys. 

There  is  no  editor  in  the  "initial"  system  because  all 
of  the  needed  functions  are  available  through  the  keyboard 
keys,  aaking  the  PDBHS  a  full-screen  editor,  albeit  a  sea 3,1 
screen  editor.  There  is  an  editor  vocabulary  which  is 
defined  in  the  PDBHS  after  BOOT  and  1SSEHBLER.  This  editor 
is  only  needed  once  the  user  has  begun  working  directly  with 
screens.  Table  5.1  shows  the  vocabulary  structure  of  the 
PDBHS.  The  concept  of  sealed  vocabularies*  is  employed; 
however  notice  that  soae  words  link  one  vocabulary  tempo¬ 
rarily  to  others.  For  example,  SEAL  causes  a  search  of  the 
Key  vocabulary.  SEAL  and  OMSEAL  are  defined  in  the  DB 
vocabulary  to  be  themselves  (i.e.,  they  simply  point  to 
their  definitions  in  BOOT)  .  This  allows  them  to  be  used  by 
the  naive  user  without  directly  accessing  the  root  vocabu¬ 
lary.  E*PROfl  permanent  vocabularies  (i.e..  Key,  file,  and 
virtual  block)  are  not  linked  through  each  other  or  those 
vocabularies  defined  in  RA9 .  Thus  PORGETting  a  definition 
in  RAH  which  precedes  a  file,  block,  or  key  definition  will 
not  erase  any  E*PROH  definitions1®. 


♦These  are  vocabularies  which  confine  word  .searches  to 
themselves,  and  usually  PORTE.  The  FIND  used  in  fig-FORTH 
searches  all  parent  vocabularies  of  the  current  vocabu¬ 
laries.  The  calculator  and  database  vocabularies  are 
totally  sealed  in  that  not  even  the  root  vocabulary  is 
searched. 


*04  sometimes  problematic  feature  of  standard  PORT?  is 
that  all  definitions  are  actually  maintained  in  one  straight 
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Figure  5.1  PDBBS  Vocabulary  structure. 


i.  ika  salcalatgi 


Initially  the  calculator  is  entered  by  pushing 
shift-C.  This  places  the  user  into  the  calculator  context 
whose  vocabulary  contains  redefinitions  of  ♦  *  -,  x,  and  ♦  so 
that  they  are  infix  operators.  FIHD  has  been  modified  so 
that  if  a  word  is  not  found  and  an  equal  sign  has  been 
previously  interpreted*  a  constant  is  created.  This  allows 
the  user  to  store  temporary  results  by  creating  "variables" 
sis  ply  by  using  an  undefined  word.  For  example, 

1  ♦  B  *  A 

would  cause  "A"  to  be  created.  If  "B"  had  not  been  previ¬ 
ously  defined  an  error  condition  would  be  raised  when  it  was 
not  found  in  the  dictionary.  The  equal  sign  is  an  input 
immediate  which  causes  "A"  to  be  created*  if  need  be*  and 
sets  up  an  execution  vector  to  cause  the  ENTER  key  to  store 
the  top  of  the  stack  into  "A." 

Because  a  derivative  of  F03TH  is  used*  floating 
point  arithmetic  is  not  used.  The  system  defaults  provide 
the  user  with  a  fixed  two  digits  behind  the  radix  point. 
Like  FORTH*  the  user  may  choose  any  base  (radix)  for  arith¬ 
metic  operations*  within  the  limits  of  the  number  of  input 
symbols  available. 

2-  ika  Database 

Initially  the  database  management  system  is  entered 
by  pushing  shift-D.  This  vocabulary  allows  users  to  create 
files*  create  records*  retrieve  records,  update  records, 
delete  records*  and  delete  files.  Additionally  the  user  may 


forgotten-even  if  they  are  not  in  the  current  vocabulary, 
then  there,  are  multiple  vocabularies,  this  can  create 
dangling  pointers  in  vocabulary  def initions. 
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create  and  delete  Keys,  and  use  Keys  to  lock  records  and 
other  Keys. 

a.  Keyboard  Key  Definitions 

Hhen  the  user  is  placed  into  the  database 

context  the  HEXT  keys  are  redefined  as  described  before. 

Besides  those  tvo  keys,  tha  following  shifted  characters  are 

defined.  These  keys  are  described  below.  The  word  which 
appears  on  the  display  and  in  the  input  aessage  buffer  when 
the  key  is  pushed  is  shown  in  parentheses. 

(1) *  £  ( DELETE ) .  This  is  used  to  delete  a 

file,  record,  or  Key.  There  are  three  different  DELETES, 
one  in  each  the  DB,  file,  and  Key  vocabularies.  Each  delete 
effects  only  those  elements  in  its  respective  vocabulary. 
The  delete  in  the  file  vocabulary  deletes  files,  the  one  in 
the  Key  vocabulary  deletes  Keys,  and  the  one  in  the  DB 
vocabulary  deletes  the  current  record. 

(2)  .  £  (FILE)  .  This  word  changes  the  context 

for  the  interpretation  of  the  words  following  it  in  the 
input  streas  so  that  the  file  vocabulary  is  searched.  The 
context  is  reverted  to  the  DB  ("calling'*)  vocabulary  when 
the  first  word  not  found  in  the  file  vocabulary  is  encoun¬ 
tered.  The  last  filenaae  aentioned  before  the  context  is 
switched  out  of  the  file  vocabulary  becoaes  the  "current 
file." 

(3) .  £  (GET) .  This  is  used  to  initiate  a 

record  retrieval.  Table  III  shows  a  typical  record  proce¬ 
dure.  First  the  user  is  asked  if  the  current  file  is  the 
one  to  be  searched,  or  asked  for  a  file  if  there  is  no 
current  file.  Then  the  user  is  presented  with  the  naaes  of 
the  fields  of  the  records  in  the  file  so  the  user  can  enter 
values  which  are  to  be  used  as  key  attributes  for  retrieval. 
If  the  user  does  not  desire  to  enter  a  value  for  a  partic¬ 
ular  field,  he  siaply  presses  the  ESTER  key.  The  query  in 
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Table  III  is  a  request  far  any  recard  in  the  ADDR-BK  file 
which  contains  "TABETHA"  in  its  NAME  field  and  "MONTEREY"  or 
"VA."  in  its  CITT/ST  field.  Before  actually  performing  a 
retrieval  operation,  the  user  is  asked  if  he  still  desires 
to  do  the  retrieval  allowing  him  to  abort  a  query  if  he  has 
realized  that  he  has  made  a  mistake. 

TABLE  III 
Record  Retrieval 


SSI 

PILE  ADDR-BK? 

1£S 

NAME? 

ISSSIM 

STREET? 

<££&£> 
CITY/ST? 
MONTEREY  VA . 
PHONE? 

<§H£SI> 

MI  SC? 

<£££§£> 

GET? 

I£S 

1  RECORD  POD  ND 
POSH  NEXT 


(4).  I  ( HID  El  .  This  is  used  to  make  a  Key 

which  has  been  made  known  through  a  ONSEAL  operation, 
unknown. 
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(5) .  K  (KEY).  Tnis  word  changes  the  context 
for  the  interpretation  of  the  words  following  it  in  the 
input  stream  so  that  the  Key  vocabulary  is  searched.  As 
with  the  shift-F,  the  context  reverts  to  the  calling  vocabu¬ 
lary  when  the  first  word  not  in  the  Key  vocabulary  is 
encountered.  This  word  does  not  effect  any  Keys  or  the  Key 
vocabulary,  it  is  only  used  as  a  prefix  word  for  BAKE  and 
DELETE. 

(6) .  B  (BAKE)-  This  word,  like  DELETE  exists 
in  the  DB,  file,  and  Kay  vocabularies.  Each  different 
version  creates  a  record,  file,  and  Key  respectively. 

(7)  .  II  (Ng)  .  This  is  used  as  an  answer  to 
appropriate  system  prompts. 

(8) .  P  (£UJ).  This  is  analogous  to  SAVE- 

BOFPEHS  and  FLUSH  in  that  it  writes  the  current  record  to 
secondary  storage. 

(9) .  £  ( BECOfiD) .  This  word  is  included  for 

consistency  reasons.  It  is  used  to  preface  DELETE  and  BAKE 
wh9n  the  user  wishes  to  use  the  DB  definitions  of  these 
words.  The  DB  DELETE  and  HAKE  must  be  prefaced  by  RECOBD  so 
that  there  is  less  chance  of  an  accidental  record  deletion. 

(10) .  S  (SEAL)  .  This  is  used  to  seal  a  Key  or 
the  current  record.  It  is  simply  defined  as: 

:  SEAL  HOOT  SEAL  ; 

This  allows  the  user  access  to  the  root  word  SEAL  without 
directly  accessing  the  root  vocabulary. 

(11) .  U  ( UNSEAL )  .  This  word  is  used  to  unseal 
all  objects  sealed  with  one  or  more  Keys.  It,  like  SEAL,  is 
simply  defined  in  terms  of  the  root  word  UNSEAL. 

(12) .  X  (YES) .  This  is  used  as  an  answer  to 
appropriate  system  prompts. 
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b.  Pile  Creation 

Piles  are  created  simply  by  using  the  words  PILE 
and  HAKE.  Upon  entering  shift-F  (or  PILE)  and  shift-M  (or 
HAKE)  ,  the  user  needs  only  to  follow  the  system's  prompts. 
Table  IV  shows  the  file  creation  sequence.  The  user's  input 
is  underscored.  The  user  always  gets  an  additional  field 
called  "miscellaneous''  added  to  the  bottom  of  all  records. 
This  is  included  because  it  was  found  that  people's  personal 
data  does  not  normally  fit  a  uniformly  structured  record. 

c.  Pile  Deletion 

Pile  deletion  is  simply  effected  by  the  sequence 
shown  in  Table  V.  Pile  deletion  is  not  a  trivial  matter 
since  the  E*PROH  is  organized  as  a  heap  with  physical 
records  containing  a  mixture  of  sealed  Keys,  DB  dictionary 
entries,  and  records  from  various  files.  First  of  all,  a 
user  cannot  delete  a  file  unless  he  has  unsealed  all  of  the 
records  in  it,  so  DELETE  must  make  one  pass  of  all  the 
records  in  the  file  to  ensure  that  they  are  all  unsealed. 
If  all  of  the  records  are  unsealed,  then  a  second  pass  is 
made  of  the  records  reallocating  all  of  the  physical  records 
back  to  the  systen  (i.e. ,  setting  their  corresponding  bit  to 
zero  in  the  record  bit  map)  .  Additionally,  on  this  pass  the 
first  byte  of  each  physical  record  is  set  to  80H  (the 
system's  Key)  while  the  second  byte  is  set  to  FFH  (the  null 
Key).  Then  the  DB  dictionary  must  be  searched  for  all 
references  to  the  deleted  field  numbers,  and  these  must  be 
removed.  When  a  field  reference  is  removed  from  a  wordd's 
list  of  field  IDs,  the  hole  created  by  this  deletion  is 
filled  by  moving  the  last  entry  on  the  list  up  to  the 
vacated  spot.  Physical  records  vacated  by  this  operation 
are  returned  to  the  system.  Finally  the  file's  vocabulary 
and  its  field  entries  can  be  forgotten.  Obviously  file 
deletion  is  a  lengthy  and  complicated  process. 
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TABLE  If 

File  and  Key  Creation 


File  Creation 

Fii£  SAKE 
HA  HE? 

±m-BK 
FLD  1  NAHE? 

FLO  2  NAHE? 

STREET 
FLD  3  NAHE? 

£111^12 

FLD  4  NAHE? 

PHONE 

FLD  5  NAHE? 

<££*££> 

FLD  5  HISC  OK 

Key  Creation 

ESI  3M£  5SBSI 

OK 


d.  Key  Creation 

Creation  of  a  Key  is  very  siaple,  as  shown  in 
Table  IT.  The  ezanple  shows  the  creation  of  a  key  naaed 
"SECRET."  All  that  is  required  to  create  a  Key  is  the  addi¬ 
tion  of  "SECRET"  into  the  Key  dictionary  as  a  constant  and 
initializing  it  to  the  next  available  Key  ID  nuaber. 


converting  the  deleted  Key's  ID  to  FEH  (the  deleted  Key  ID). 
After  this  is  done  the  Key  is  deleted  fros  the  dictionary. 
&  sealed  Key's  physical  record  is  returned  to  the  systes, 
after  setting  the  first  byte  to  80H  (the  systes  Key)  and  the 
second  byte  to  FFH  (the  null  Key). 

f.  Record  Creation 

To  the  user  record  creation  dialogue  is  siailar 
to  the  one  associated  with  file  creation.  Rhat  is  involved 
is  collecting  the  desired  data,  encoding  it**,  finding  phys¬ 
ical  records  to  hold  the  logical  record,  and  finally  linking 
the  record  into  the  parent  file's  linked  list  of  records. 
Currently  the  linked  lists  of  records  are  saintained  in 
chronological  order  (i.e.,  as  a  circular  queue).  This  say 
be  frustrating  in  sose  applications  where  the  user  would 
like  to  peruse  the  database  in  sose  specified  order.  For 
exaaple,  it  is  not  possible  to  view  the  records  of  an 
address  book  alphabetically  by  surnase,  unless  they  were 
originally  entered  in  that  order.  Because  of  the  unfor- 
■atted  nature  of  the  fields,  it  is  very  difficult  to  sort  a 
file  by  key  attributes. 

It  would  not  be  too  difficult  to  allow  the  user 
to  specify  a  record  ordering  other  than  chronological.  This 
could  be  done  by  allowing  the  user  to  flag  a  wordd  in  the 
record  as  the  sort-key-value  (for  exaaple  the  last  word  in  a 
record  starting  with  the  character  "8") .  Then  when  the 
record  was  POT  into  the  database,  it  would  be  inserted  into 
the  file's  linked  list  alphabetically  relative  to  the  other 
"fr-wordds"  in  the  file's  other  records.  So  the  user  could 


t  tihis 
wordd  s,  an 
dictionary. 


includes  conve 
d  then  the  a 


the  uwords  to 
on  of  the  wor 


oungtuation  and 
das  into  the  DB 


71 


maintain  the  file  sorted  by  surname  by  prefacing  all 
surnames  with  a  "a"»2. 


I1BLB  TI 
Record  Creation 


1IC0BD  3MJ 
NAME? 

JOfil  DOg 
STREET? 

mm  isis  si  ns 

CI TY/ST? 
PORTLAND  .  OfiS- 
PHONE? 

HI  SC? 

<£fi£2£> 

OK 


Table  71  shows  a  typical  record  creation 
sequence,  Notice  that  no  phone  number  was  given;  a  null 
entry  is  signalled  by  hitting  the  ENTER  key.  Also  notice 
that  there  is  an  implicit  "current  file."  This  file  is  the 
last  one  referred  to  after  the  last  use  of  PILE;  had  no  file 
been  explicitly  referenced  before  a  record  creation  was 
attempted,  the  PDBHS  would  have  requested  a  file  name.  If 
the  file  was  not  found,  the  user  would  have  been  asked  if  he 
desired  to  create  a  file  or  abort  the  record  creation. 


**This  may  not  appeal  to  many  users,  but  it  would  not 
necessarily  nave  to  appear  in  the  name  field.  The 
"B- surname"  could  be  placed  in  the  "miscellaneous"  field. 
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g.  Record  Deletion 

Record  deletion  is  requested  by  the  user  in  the 
saae  fashion  as  file  and  key  deletion.  Record  deletion 
involves  first  reaoving  the  record  froa  the  file's  linked 
list  by  Baking  the  two  records  adjacent  to  the  current 
record  point  to  each  other.  These  links  are  found  in  the 
current  record's  previous  and  next  link  bytes  (see  Figure 
4.5).  Then  all  of  the  vordd  references  to  the  record  in  the 
DB  dictionary  aust  be  delated.  Finally  the  physical  records 
are  returned  to  the  systea  after  setting  the  first  byte  to 
8 OH  and  the  second  to  FFH. 

h.  Update 

Only  records  a  ay  be  updated;  files  and  keys 
cannot.  Records  are  siaply  updated  by  GETting  then,  modi¬ 
fying  then  using  the  cursor  control  keys,  and  then  PlJTting 
thea  back.  Like  FORTH,  once  a  change  has  been  aade  to  a 
record,  it  is  aarked  as  baing  updated,  whether  or  not  the 
change  is  later  undone  in  the  saae  editing  session.  Once  a 
record  has  been  aarked  as  updated  and  it  is  POT,  the  updated 
record  is  added  as  a  new  record,  and  the  old  record  is 
deleted.  This  is  not  quits  as  drastic  as  it  aay  sound.  The 
old  record  is  used  as  a  teaplate  for  encoding  the  new 
record.  Wordds  which  are  unchanged  can  be  copied  froa  the 
old  record  directly  into  the  new  racord.  The  old  record 
also  contains  all  of  the  pointers  into  the  DB  dictionary 
where  new  virtual  addresses  aust  be  substituted,  so  the 
dictionary  nust  be  searched  only  when  a  new  vordd  is  added. 
Record  update  is  actually  a  record  creation  and  deletion 
operation. 

Zt  could  be  possible  to  allow  file  editing 
(i.  e. ,  the  addition  and  deletion  of  fields)  by  perforaing 
the  saae  type  of  operations  as  are  eaployed  in  racord  update 
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(i.  e. ,  creating  a  new  file,  transferring  the  appropriate 
data  froa  the  old  file  into  the  new  file,  and  then  deleting 
the  old  file).  However,  this  was  considered  too  coaplicated 
and  slow  to  justify  its  inclusion  for  what  would  probably  be 
a  rare  event.  Besides,  by  always  including  a  "miscellaneous 
field"  in  all  records,  it  was  felt  that  this  would  probably 
not  be  a  very  necessary  operation. 
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TI.  sxsifc'  3JSSSMI  B15ISS 

As  stated  earlier,  security  i3  ieportant  in  a  PDBHS 
because  of  the  personal  nature  of  the  inforaation  it  say 
contain.  However,  the  type  of  security  afforded  in  this 
design  is  probably  better  suited  for  a  larger  systea. 
Probably  all  that  is  required  for  such  a  systea  as  the  PDBHS 
is  a  siaple  mechanise  which  eaploys  one  Key  or  password. 
This  allows  the  user  to  hide  anything  he  desires  at  one 
level  of  security  (i. e«,  one  either  ha3  access  to  all  of  the 
data  or  has  access  to  only  a  subset  of  the  data)  .  The  PDBHS 
uses  a  auch  aore  elaborate  systea.  JFhis  was  done  to  test 
two  things:  the  feasibility  of  securing  FORTH,  and  the 
feasibility  of  iapleaenting  a  security  aechanisa  siailar  to 
the  one  described  in  reference  [10].  FORTH  was  chosen  as 
the  language  to  iapleaent  the  PDBHS  with  no  firsthand  knowl¬ 
edge  of  the  language.  Because  it  is  an  interpreted 

language,  it  was  felt  that  there  would  be  no  probleas  with 
securing  the  systea.  However,  after  receiving  the  FORTH 
docuaentation  and  software  aany  doubts  were  raised  about 
whether  the  language  could  be  secured. 

At  first  one  thing  which  seeaed  essential  to  securing 
the  PDBHS  was  the  restriction  of  the  user's  ability  to  use 
asseably  language.  If  the  user  can  write  words  in  assembly 
language  using  physical  addresses  and  ports  (the  only  way  to 
write  such  words  on  the  HSC800  since  it  does  not  support 
segaentation  and  privileged  aodes)  there  is  almost  no  limit 
to  what  he  can  do.  All  standard  FORTHs  are  very  close  to 
the  hardware  and  allow  words  to  be  written  in  assembly 
language,  besides  FORTH.  As  a  matter  of  fact,  it  is  so 
close  to  the  machine,  that  in  8080  fig-FORTH  and  FORTH-79, 
it  is  impossible  to  prevent  the  programmer  from  writing 
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asseably  language  defined  words  without  changing  FORTH  to 
such  an  extent  that  it  is  no  longer  the  saae  language.  In 
these  two  systeas,  the  words  which  are  used  to  specify  code 
definitions  (;CODE,  CODE,  EHD-CODE,  and  (;S})  are  all  high- 
level  words  (i.  e. ,  words  written  in  FORTH  as  contrasted  to 
low-level  words  which  are  written  in  asseably  language),  as 
is  the  asseabler.  As  far  as  the  author  can  determine,  there 
is  no  low-level  word  which  can  be  "hidden"  froa  the  user 
without  having  a  detrimental  effect  and  which  is  required 
for  entering  asseably  language  defined  words. 

The  word  "hidden"  is  enclosed  in  quotes  in  the  previous 
paragraph  because  no  word  can  be  hidden  froa  a  user  in  his 
address  space.  "Hidden"  aeans  that  the  user  neither  knows 
of  the  hidden  word's  existence  or  doesn't  know  where  to  find 
its  definition,  nor  can  ha  execute  it  directly.  A  word  in 
FORTH  which  can  be  located  can  be  executed  even  if  it  is  not 
in  the  FORTH  linked  list  word  dictionary  (one  siaply  puts 
the  address  of  the  first  executable  byte  onto  the  paraaeter 
stack  and  evokes  EXECUTE) .  If  a  user  is  to  be  allowed  to 
prograa  in  FORTH,  he  aust  be  allowed  to  access  words  in  the 
ROOT  dictionary,  and  in  order  to  access  words,  their  names 
must  appear  in  the  dictionary  since  FORTH  searches  the 
dictionary  by  naae.  This  aakes  it  very  easy  for  a  user  to 
traverse  the  dictionary  and  look  at  its  contents  and  at  the 
location  of  words.  It  would  not  be  hard,  though  probably 
tedious,  to  find  a  word  not  included  in  the  dictionary  by 
checking  for  unaccounted  gaps  between  words  in  the  linked 
list  or  finding  a  reference  to  a  code  field  address  of  a 
word  which  does  not  appear  in  the  dictionary.  If  one  were 
to  seriously  consider  hiding  words,  the  best  way  to  do  this 
would  be  to  reaove  all  of  the  headers  (the  naae  and  link 
fields)  froa  all  of  the  dictionary  entries.  Such  a  systea 
could  not  be  extended  because  no  words  in  the  dictionary 
could  be  found  (since  the  naae  and  link  fields  are  necessary 
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to  search  for  a  word) .  If  the  PDBHS  was  to  be  secured  there 
had  to  be  another  nethod  which  eithar  prevented  the  use  of 
assembly  language  or  worked  regardless  of  the  fact  that  the 
user  could  use  asseably  language. 

In  the  PDBHS ,  FORTH  could  possibly  have  been  secured 
entirely  by  using  software  and  still  allowed  the  user  to 
progras  in  FORTH,  however  it  would  have  undoubtedly  been  a 
very  lisited  subset  of  the  language.  Such  a  systea  would 
have  not  needed  EPROH;  instead  a  cold  boot  could  have  loaded 
the  systea  in  froa  E*PROH.  Verifying  such  a  systea  would 
have  surely  been  a  problea.  Instead  the  PDBHS  relies  on 
both  hardware  and  software  to  enforce  systea  security. 

A.  HARDWARE  SECURITY  HEASUHES 

In  aulti-user  systeas  hardware  support  of  security  is 
essential;  in  truly  secure  systeas  it  aust  be  verified  that 
there  are  parts  of  the  systea  that  no  one  but  systea  adain- 
istrators  can  access.  In  the  PDBHS  the  hardware  and 
software  enforce  security  to  such  an  extent  that  even  the 
owner  of  the  systea  cannot  access  parts  of  the  systea  at 
all13.  This  is  desirable  because  it  not  only  prevents  other 
persons  who  are  not  the  owner  of  the  PDBHS  froa  coaproaising 
or  destroying  the  systea,  but  it  also  prevents  the  user  froa 
"terainally  crashing”  the  systea.  Hany  of  the  systea' s  boot 
paraaeters  are  stored  in  EPROH  and  S2PR0H.  If  these  were 
lost,  the  systea  could  not  be  booted  up. 

It  is  the  interaction  of  the  EPROH  and  the  "saart  ports" 
which  is  the  hardware  portion  of  the  systea* s  security. 
Siaply,  the  ports  which  control  access  to  virtual  aeaory, 
the  keyboard,  and  the  RS232  port  oaly  accept  instructions 


13The  PDBHS  has  not  been  proven  correct  and  secure  in 
the  sense  of  the  ways  described  in  references  .[11  ana  12]. 
However,  the  author  believes  that  it  can  be  aade  secure  and 
rigorously  proven  to  be  so. 
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executing  fron  EPROM,  as  discussed  in  Chapter  IV.  Because 
EPROa  is  read-only,  the  user  is  forced  to  use  procedures  in 
it  to  access  these  external  devices.  Thus  if  the  procedures 
in  EPROB  can  be  verified  that  they  are  not  only  correct,  but 
they  are  also  unsubvertable ,  then  the  PDBBS  can  probably  be 
■ade  secure*4. 

B.  SOFTWARE  SECURITY  HE1SURBS 

The  hardware  in  itself  does  not  guarantee  a  secure 
systea;  there  aust  be  soae  verified  software  which  operates 
it.  There  are  three  different  aspects  of  the  software  in 
the  PDBBS  which  are  used  to  provide  security.  k  fourth 
aspect  is  aentioned  here  which  is  related  to  security  but  is 
not  involved  in  systea  security  per  se.  The  first  three 
iteas  are:  straight-through  code,  aaintenance  of  systea 
paraaeters  and  tables  in  E*PROB,  and  Keys.  The  fourth  itea 
is  the  FORTH  concept  of  execution  vectors. 

1.  Straight-through 

Contrary  to  FORTH  prograaaing  style,  words  which  are 
involved  with  port  access  aust  be  low-level  and  indivisible. 
This  aeans  that  these  words  aust  not  be  defined  in  teras  of 
other  words,  i.e.,  they  cannot  be  colon  definitions,  they 
aust  be  code  definitions.  For  exaaple,  it  seeas  obvious 
that  one  would  like  to  write  the  following  low-level  words 
for  use  in  other  systea  aanageaent  words  because  they  would 
be  very  coaaonly  used: 


*4A  correct  procedure  is  one  that  does  only,  what  it  is 
desigrad  to  ao;  nothing  aore  ana  nothing  less. 
Unsunvertability  is  a  stronger  condition  than  correctness  in 
that  it  aeans  that  even  coabinations  of  aoaules  of  correct 
code  and  portions  of  nodules  cannot  be  caused  to  be  aade  to 
interact  incorrectly.  This  is  a  concern  in  the  PDBBS  since 
the  user  can  read  and  execute  the  systea* s  source  lachine 
code. 
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E«PROM_OH 
B*PHOH_HBT_ON 
HRT_E*  PROM 
E*PROM_SRT_OFP 
E2PB0M  OPP 


(  Turns  E*PBOM  power  on  ) 

(  Turns  E2PR0M  write  power  on  ) 

(  Initiates  an  E2PB0M  write  ) 

(  Turns  E* pboh  write  power  off  ) 
(  Turns  E2PB0B  power  off  ) 


However,  as  mentioned  before,  if  a  word  exists  in  the  user*s 
address  space,  he  can  find  it  and  execute  it.  This  means 
the  user  could  find  E2Pi0B_0!l  and  B*PBOa_WRT_OH,  and  execute 
then  froa  EPBOH.  Then  using  his  own  assembly  language 
routines,  he  could  manipulate  the  contents  of  the  E2PB0H. 
The  only  way  to  prevent  this  is  to  create  a  minimum  set  of 
virtual  memory  management  words  which,  once  execution  of  any 
one  of  then  begins,  never  branches  out  of  the  word  or 
returns  to  the  inner  interpreter  without  first  turning  off 
access  to  the  ports.  Also  these  words  should  be  written  so 
that  if  the  user  jumps  into  the  center  of  their  code,  they 
are  still  correct. 

The  first  requirement  is  fairly  easy  to  achieve 
because  these  qords  are  resident  in  EPROM,  thus  because  they 
cannot  be  altered,  if  a  user  jumps  to,  or  into,  them  it  can 
be  assured  that  he  cannot  effect  the  execution  of  the  words. 
The  second  requirement  is  much  more  problematic.  Satisfying 
this  seems  that  the  actions  of  these  code  sequences  can 
maintain  system  security  regardless  of  the  actions  performed 
before  and  after  their  execution,  and  regardless  of  whether 
the  entire  sequence  is  executed  (i.  e. ,  the  user  jumps  into 
the  middle  of  a  code  sequence).  For  example,  the  user  must 
not  be  able  to  use  the  code  of  one  word  (whether  it  is  the 
entire  code  sequence  or  a  part  of  it)  to  set  up  the  segment 
register  to  point  to  the  Key  dictionary,  and  then  by  using 
another  word,  retrieve  the  Key  dictionary. 
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2.  Maintenance  of  5  y  stem  Parameters  and  Tables  in 

E2  PROM 

By  controlling  access  to  E2PR0M  it  is  possible  to 
use  parts  of  it  to  store  information  which  the  user  should 
not  have  access  to.  Chapter  IV  discusses  the  information 
which  is  stored  out  in  E2PB0H  which  is  not  accessible  to  the 
user.  The  locations  of  the  parameters  and  beginnings  of 
these  tables  are  static  so  that  they  may  be  referred  to 
directly  by  using  their  segment  number  and  E2PR0H  addresses 
(FFOOH  through  FFPFH)  .  These  references  are  found  in  EPROM 
where  they  are  visible  to  the  user.  The  insurance  that  the 
user  cannot  directly  access  these  segments  must  be  incorpo¬ 
rated  into  the  design  of  the  straight-through  code.  The 
code  eust  be  written  so  that  when  control  is  passed  from  the 
word  to  the  inner  interpreter,  the  user  is  left  with  no  more 
information  about  the  tables  and  parameters  than  he  is 
authorized  access  to.  Any  routines  which  do  system  table 
and  parameter  maintenance  are  designed  so  that  they  work 
directly  on  the  E*PROM  and  never  bring  the  contents  of  these 
segments  into  RAH.  Thi3  makes  it  easier  to  ensure  the 
security  of  system  segments. 

The  above  is  not  entirely  true  of  the  PDBMS.  During 
retrieval  operations,  virtual  addresses  are  brought  into  the 
data  buffers.  Thus  the  user  can  gain  some  information  about 
the  maintenance  of  the  system's  segments  by  dumping  the 
contents  of  these  buffers.  This  information  is  kept  in  RAH 
because  it  is  a  "write-int ensive"  operation.  Additionally 
it  must  be  left  in  the  buffers  after  the  system  is  finished 
with  processing  the  query  because  the  virtual  addresses  must 
be  used  to  find  the  records  which  satisfy  the  query  condi¬ 
tions.  The  current  record's  virtual  address  is  needed  so 
that  if  it  is  updated  the  location  of  the  old  record  can  be 
found  and  deleted.  Thus  the  user  can  gain  access  to  the 
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virtual  addresses  cf  records  to  which  he  is  authorized. 
Allowing  the  user  access  to  the  virtual  addresses  of  all  of 
the  records  which  satisfy  a  query  gives  him  some  information 
from  which  he  can  make  inferences  about  the  allocation  of 
physical  records,  including  those  to  which  he  is  not  author¬ 
ized  access.  How  much  information  can  be  gained  through 
inference  seems  to  be  limited  by  the  fact  that  the  segments 
in  which  these  records  occur  contain  not  only  records  (which 
can  use  varying  numbers  of  physical  records) ,  but  sealed 
Keys  and  DB  dictionary  entries  (which  also  use  varying 
numbers  of  physical  records).  Additionally  if  any  deletions 
or  updates  ever  occurred,  the  physical  records  may  no  longer 
be  allocated  in  a  sequential  and  chronological  manner.  Thus 
in  a  mature  (i»e.,  one  which  has  processed  a  number  of  Key 
and  record  additions  and  deletions)  system,  it  is  question¬ 
able  that  much  meaningful  inference  can  be  done.  Of  course, 
the  problem  can  be  avoided  entirely  by  keeping  all  of  these 
virtual  addresses  in  E*PR0t!  at  the  expense  of  system  speed 
and  possible  E*PROH  "burn-out.” 

3.  Mis 

The  proper  implementation  of  Keys  relies  heavily 
upon  the  preceding  hardware  and  software  base.  Keys  are 
very  simple — nothing  is  fetched  from  E*PROM  unless  the 
proper  Key  (s)  has  been  ONSEALed  (or  made  known).  The  opera¬ 
tions  associated  with  SEAL  and  UNSEAL  effect  the  Key 
dictionary  but  have  no  effect  upon  sealed  objects.  As 
mentioned  earlier.  Keys  are  maintained  in  a  dictionary  as 
constants.  When  a  Key  is  OHSEALed,  the  high  bit  of  its 
character  count  byte  is  set  to  one.  When  a  data  object 
fetch  is  requested,  the  object's  access  descriptor  field  is 
"computed"  to  see  if  the  requisite  Keys  have  been  previously 
made  known. 


The  access  descriptor  fields  are  limited  to  the 
first  physical  record  for  screens  (15  Keys),  15  Keys  for  a 
sealed  Key  (one  physical  record  less  one  byte  for  the  sealed 
Key's  ID),  and  no  limit  for  database  record  (since  they  are 
permitted  to  cross  physical  record  boundaries).  However  for 
consistency,  from  the  user's  point  of  view,  15  Keys  is  the 
limit  for  all  system  objects.  The  Keys  may  be  "anded"  and 
"ored"  with  each  other  to  form  complicated  access  mecha¬ 
nisms.  This  may  be  further  extended  by  adding  layers  of 
sealed  Keys.  For  example  if  access  to  the  current  record 
required  the  Keys  "CONFIDENTIAL"  and  "ACCESS,"  or  the  Keys 
"SECRET"  and  "ACCESS,  **  the  current  record  could  be  sealed  as 
follows: 

KEI  CONFIDENTIAL  ACCESS  &  SECRET  &  |  RECORD  SEAL 

or 

KEY  CONFIDENTIAL  SECRET  |  ACCESS  &  RECORD  SEAL 

where  is  a  logical  "and"  and  "|"  is  a  logical  "or."  If 
CONFIDENTIAL'S  ID  was  one,  SECRET'S  two,  and  ACCESS'S  three, 
and  the  second  example  above  had  been  used  to  seal  'he 
record,  then  the  record  would  have  four  key  bytes  which 
would  contain: 

01 H  02  H  83H  FFH 

Notice  that  the  high  bit  of  ACCESS'S  ID  was  set  to  one. 
This  signifies  that  it  is  to  be  "anded."  A  zero  high  bit 
signifies  the  Key  is  to  be  "ored."  Unique  "access  paths" 
are  described  in  both  the  SEAL  process  and  the  access 
descriptor  because  they  are  specified  using  reverse  Polish 
notation. 
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When  an  attempted  fetch  of  a  record  is  made,  the 
fetch  algorithm  starts  by  setting  a  fetch  flag  to  true  (the 
value  one).  Then  it  simply  reads  each  Key  ID  from  the 
access  descriptor  and  searches  the  Key  dictionary  to  see  if 
the  Key  is  known  (i.e.,  the  high  bit_of  its  character  count 
is  set  to  one).  If  the  Key  is  known,  the  search  returns  a 
one,  otherwise  a  zero.  The  result  of  the  search  is  "anded" 
or  "ored"  with  the  fetch  flag  according  to  the  high  bit  of 
the  byte  in  the  access  descriptor.  ilhen  the  null  Key  is 
found  in  the  access  descriptor,  the  value  of  the  fetch  flag 
determines  whether  the  object  is  sealed  or  unsealed. 

Since  the  Key  dictionary  entries  are  maintained  as  a 
FORTH  dictionary  and  FORTH  dictionaries  are  searched  by 
name,  it  may  seem  that  searching  the  dictionary  using  the 
Key's  ID  may  be  difficult.  It  is,  in  fact,  faster  than 
searching  by  name.  This  is  because  of  the  structure  of  the 
dictionary  entries  which  allow  the  Key's  value  to  be 
retrieved  easily  because  it  is  located  in  the  byte  immedi¬ 
ately  following  the  CFA.  Searching  by  name  is  slower 
because  it  involves  string  comparisons. 

At  the  root  of  the  Key  dictionary  (i.e.,  that  entry 
whose  link  is  egual  to  003  OH)  is  the  definition  of  HAKE. 
Below  HAKE  are  all  of  the  other  colon  definitions  in  the  Key 
vocabulary.  After  the  last  colon  definition  is  the  defini¬ 
tion  of  the  system  Key.  This  is  a  constant  like  the  other 
Keys  but  its  value  is  80H  and  its  count  byte  contains  a  00H. 
This  means  that  its  name's  length  is  zero,  and  thus  it  has 
no  name  and  cannot  be  found  by  a  name  search  of  the 
dictionary.  Because  it  cannot  be  found,  it  can  never  be 
OHSEALed  or  made  known,  so  the  high  bit  of  its  character 
count  will  always  remain  zero.  Below  the  system  Key  are  the 
definitions  of  the  null  Key  and  the  deleted  Key.  These 
Keys'  values  are  FFH  and  FEH  respectively  and  their  char¬ 
acter  count  bytes  are  egual  to  80H.  This  means  that  they 
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also  have  no  name  and  they  always  remain  OHSEALed  or  known. 
Because  these  three  Keys'  values  are  greater  than  127,  they 
are  always  "anded"  into  any  Key  ID  list  in  which  they 
appear. 

Changing  a  deleted  Key's  ID  number  wherever  it 
occurs  in  an  access  descriptor  list  results  in  a  "sensible" 
condition.  That  is,  all  other  Keys  are  still  required  in 
their  same  logical  relationship  except  that  Key  (or  rela¬ 
tion)  which  preceded  the  deleted  Key  which  now  takes  the 
place  of  the  relation  between  itself  and  the  deleted  Key.  A 
major  problem  with  deleting  a  Key  is  that  the  user  may  not 
realize  the  data  objects  which  he  is  effecting  or  how  he  is 
effecting  them.  This  is  an  unresolved  problem  in  the  PDBHS 
and  it  is  more  complicated  than  it  appears  on  the  surface. 

Finally,  there  is  one  last  important  operation  which 
concerns  maintenance  of  the  Key  dictionary:  making  Keys 
unknown.  The  user  can  make  Keys  unknown  on  an  individual 
basis  by  using  HIDE.  For  example, 

KEY  SECRET  HIDE 

makes  "SECRET"  unknown  and  seals  any  objects  which  are 
sealed  with  SECRET.  Whenever  an  non-maskable  interrupt  is 
generated,  the  virtual  memory  manager  makes  all  Keys  whose 
character  count  is  greater  than  BOH  unknown. 

4 .  szzsuzisik  issilsis 

Execution  vectors  are  used  in  the  PDBHS  to  allow  the 
user  to  interact  with  only  that  part  of  the  system  which  he 
understands.  However,  they  can  be  used  to  provide  system 
security  to  an  extent.  Simply,  if  a  user  does  not  know  how 
to  change  a  vector's  value  (or  a  collection  of  vectors)  or 
what  value  to  change  it  to,  the  situation  is  similar  to 
needing  a  password  to  access  a  more  powerful  system.  At  the 
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lowest  level  it  is  easy  to  prevent  a  user  from  using  more  of 
the  system  than  is  desired.  If  the  user  is  constrained  to  a 
vocabulary  which  does  not  contain  words  which  would  allow 
him  the  make  colon  definitions  (e.g.,  {:})  or  access  memory 
directly  (e.g.,  {!},  {i},  etc.)  the  inner  working  of  the 
system  can  be  hidden  from  him.  Making  a  user  more  privi¬ 
leged  simply  means  giving  him  the  name  of  a  word  which 
changes  the  values  of  the  execution  vectors  (of  course  this 
word  cannot  appear  in  a  listing  of  the  vocabulary).  ks  the 
system  to  which  the  user  gains  access  becomes  more  powerful, 
it  becomes  progressively  harder  to  provide  system  security 
by  using  execution  vectors  without  relying  upon  hardware. 


kzzmn  & 

THE  LA IGOAGE  FOBTB 


A  good  description  of  the  concepts  apon  which  FORTH  is 
based  may  be  found  in  reference  [13].  FORTH  is  a  stack* 
oriented,  threaded,  interpretive  language.  It  is  noted  for 
its  coapact  size  and  fast  execution  (compared  to  other 
interpreted  languages  such  as  BASIC)  .  The  8080  f ig-FORTH 
aodel  (version  1.3)  occupies  less  than  9K  bytes  of  memory 
(which  includes  the  first  page  of  aeaory  occupied  by  CP/H) . 
Residing  in  that  9K  is  the  FORTH  interpreter,  compiler, 
dictionary,  and  a  line  editor.  There  are  two  "generic" 
FORTHs.  The  older  version  is  usually  referred  to  as 
"fig- FORTH,"  the  newer  version  is  usually  referred  to  as 
"FORTH-79."  FORTH-79  was  designed  to  be  a  standard  which 
establishes  the  ainiaua  requirements  of  the  language. 
Specifically  reference  [2]  states  that  the  purpose  of 
FORTH-79  is 

...  to  allow  transportability  of  standard  FORTH  programs 
in  source  fora  aaong  standard  FORTH  systems.  A  standard 
program  shall  execute  equivalently  on  all  standard  FORTH 
systems. 

The  bibliography  contains  a  list  of  sources  used  by  the 
author  while  learning  FORTH.  Anyone  who  seriously  desires 
to  understand  the  language  should  have  at  least  soae  of 
these  books  and  pamphlets. 

A.  HOBOS 

The  basic  unit  of  the  language  is  a  "word."  Rords  can 
be  "colon  definitions”  (analogous  to  functions  and  proce¬ 
dures  in  other  languages),  variables,  and  constants.  New 
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words  are  defined  in  terms  of  previously  defined  words, 
aaking  the  language  extensible.  Defined  words  are  kept  in  a 
linked  list  called  the  "dictionary.”  The  dictionary  is 
maintained  as  a  stack  (Firs  t-in-First-out  or  FIFO)  so  that 
the  newest  words  are  searched  first.  Thus  previously 
defined  words  can  be  redefined.  Dictibnary  entries  are 
pruned  by  using  the  word  FORGET.  when  a  word  is 

"forgotten,"  all  words  defined  after  it  are  also  forgotten. 
Rather  than  a  straight  linked  list,  the  dictionary  can  be 
extended  in  a  tree  structure  where  branches  denote  different 
contexts.  Table  711  is  a  list  of  the  FORTH-79  required 
words.  The  words  in  lower-case  are  dictionary  entries  for 
the  run-tiae  code  for  the  corresponding  compiling  word. 

B.  STSTEB  DATA  STB  DC  TORES 

Figure  A. 1  depicts  the  standard  FORTH  memory  organiza¬ 
tion.  The  user  dictionary  grows  up  towards  high  memory 
while  the  parameter  stack  grows  down  towards  the  dictionary. 
The  unused  portion  of  memory  separating  the  two  is  called 
the  pad.  The  beginning  of  the  pad  moves  up  in  memory  with 
the  dictionary  pointer  (DP).  It  is  usually  located  44H 

bytes  in  front  of  the  DP.  Likewise,  the  input  message 
buffer  grows  up  in  memory  according  to  the  size  of  the  input 
message  while  the  return  stack  grows  down  towards  the 
message  buffer. 

The  parameter  stack  is  used  for  lathematical  data  manip¬ 
ulations  and  parameter  passing.  The  data  on  the  stack  is 
operated  upon  using  reverse  Polish  (or  postfix)  notation, 
similar  to  Hewlett-Packard  calculators.  The  return  stack  is 
used  by  FORTH  for  storing  the  interpreter  pointer  (the 
address  of  the  next  higher  context,  i.e.,  the  calling  word). 
The  pad  is  used  primarily  for  string  manipulations.  System 
variables  are  those  variables  maintained  and  used  by  FORTH 


87 


TABLE  VII 
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Figure  A. 1  Standard  FORTH  Heiory  Hap 


and  not  directly  accessible  to  the  programmer.  User 

variables  are  declared,  maintained  and  used  by  the  system, 
but  are  directly  accessible  to  the  programmer.  Examples  of 
system  variables  are  cold  boot  parameters  and  CP/M  disk 
interface  parameters  while  examples  of  user  variables  are 
the  dictionary  pointer  the  current  radix  (called  BASE),  and 
the  current  execution  state  (called  STATE)  . 

The  number  of  block  buffers  is  dependent  upon  the  amount 
of  physical  memory  available.  Standard  FOETH  blocks  are  IK 
bytes  in  size  and  are  stored  in  secondary  storage,  thus 

giving  FORTH  what  its  users  call  virtual  memory.  FORTH 
automatically  allocates  buffers  as  they  are  needed  according 
to  which  buffers  have  not  been  allocated  yet,  the  age  of  the 
contents  of  occupied  buffers,  and  whether  any  buffers 
contain  updated  data.  Blocks  containing  FORTH  "programs" 
are  commonly  referred  to  as  "screens"  because  they  are 
formatted  for  CRT  display;  i.e. ,  16  lines  of  64  characters. 

C.  THE  BE CH A SICS  OF  FORTH 

There  are  less  than  73  assembly  routines  in  FORTH-79, 

most  of  which  are  less  than  20  instructions  long.  when 

FORTH  words  are  interpreted,  it  is  these  routines  which 
ultimately  are  executed,  except  in  the  case  of  user  code 
defined  words.  All  words  in  FORTH  contain  a  code  field 
address  (CFA)  which  is  a  pointer  to  an  assembly  language 
routine  which  defines  the  word's  run-time  behavior.  A 
constant's  CFA  points  to  constant  which  is  an  assembly 
language  routine  which  places  the  contents  of  the  two  bytes 
following  the  CFA  on  to  the  parameter  stack.  A  code  defined 
word's  CFA  simply  points  to  the  byte  following  the  CFA — the 
beginning  of  the  word's  code  definition. 


The  CFA  of  a  colon  definition  points  to  colon.  See 
Figure  A. 2  for  the  structure  of  a  colon  definition  in  the 
PDBMS.  This  routine  has  different  actions,  depending  upon 
the  specific  version  of  FORTH  (i.e. ,  whether  the  system 
increments  the  interpreter  pointer  before  executing  a  word, 
or  after).  In  general  though,  colon  pushes  the  current 
value  of  the  interpreter  pointer  (which  points  to  the 
current  word  being  executed  in  the  post-incrementing 
systems)  onto  the  return  stack  and  then  sets  the  interpreter 
pointer  equal  to  the  contents  of  the  first  two  bytes 
following  the  current  word*  s  CF&.  These  two  bytes  contain  a 
pointer  to  the  CFA  of  the  first  word  in  the  currently 
executing  word's  parameter  field  address  (PFA) .  Thus  the 
execution  of  a  word  describes  an  inorder  traversal  of  a  tree 
of  FORTH  words  used  to  define  a  word  and  all  words  used  in 
those  definitions,  etc.  Leaves  on  this  tree  are  code 
defined  words,  constants,  variables,  user  variables,  and 
other  data  types;  nodes  are  colon  definitions. 

Complementing  colon  is  semicolon.  This  is  the  runtime 
code  of  (;}  which  is  the  last  word  in  every  colon  defini¬ 
tion.  what  semicolon  does  is  simply  pop  the  return  stack 
and  sets  the  interpreter  pointer  equal  to  the  popped  value. 
This  causes  execution  to  move  one  layer  higher  in  the  tree 
described  above.  The  topmost  word  in  the  tree  is  QUIT, 
which  is  an  infinite  loop.  So  when  the  interpreter 

completes  the  execution  or  compilation  of  a  word,  execution 
returns  to  QBIT  which  loops  waiting  for  more  input. 

The  heart  of  FORTH  is  the  inner  interpreter.  In  the 
8080,  Z80 ,  and  HSC800  all  this  short  code  routine  does  is 

take  the  interpreter  pointer  and  push  it  into  the  program 
counter.  This  technique  of  passing  control  from  word  to 
word  makes  FORTH  almost  incomprehensible  until  the  entire 
system  is  entirely  understood.  Because  FORTH  uses  almost  no 
subroutine  calls  and  jumps,  flow  of  control  is  not 
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Figure  A. 2  Structure  of  a  PDBHS  Colon  Definition. 
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immediately  apparent.  In  8080  fig-FORTH  (version  1.3) 
almost  the  entire  FORTH  system  past  the  first  IK  bytes 
consists  of  "DB"  and  «DW"  instructions* *.  Like  LISP,  most 
of  FORTH  consists  of  data  structures  which  can  be  used  as 
data  or  executable  code. 
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STUDT  STATISTICS 


A.  BACKGROUND 


In  order  to  understand  what  eight  be  involved  in  a 
Personal  Database  Management  system,  four  address  books  were 
studied  in  detail.  The  results  of  this  study  served  as  a 
basis  for  such  of  the  design  of  the  PDBHS.  It  should  be 
pointed  out  that  the  results  of  this  study  are  probably  not 
indicative  of  the  Aoerican  population  as  a  whole.  The  books 
were  not  selected  on  any  scientific  basis  and  had  the 
following  iaportant  characteristics  which  probably  skewed 
the  findings: 


•  All  of  the  books  belonged  to  friends  and  neighbors  of 
the  author  in  California.  Thus  many  addresses,  zip 
codes,  area  codes,  etc.  ,  had  common  values. 


•  All  of  the  books  were  kept  for  faailies  and  not  individ¬ 
uals.  The  effect  of  this  in  uncertain,  but  because  of 
this  entries  in  these  books  fell  into  four  distinct 
categories: 


a  The  husband* s  relatives  (characterized  by  similar 
names,  cities,  states,  zip  codes,  etc.). 


a  The  wife* s  relatives  (having  the  same  characteristics 
as  mentioned  above). 


o  Local  friends  (characterized  by  similar  cities,  state, 
zip  codes,  telephone  area  codes  and  exchanges,  etc.). 


a  Non-local  friends  (which  had  little  in  coaaon,  except 
perhaps  the  military  in  many  cases)  . 


•  All  of  the  faailies  had  at  least  one  member  in  the  armed 
forces.  This  seemed  to  introduce  many  acronyas  and 

abbreviations  which  are  probably  not  very  coaaon  in 
civilian  spheres.  This  probably  also  accounted  for  a 

larger  than  usual  number  of  "non-local  friend  entries." 


B.  METHOD  OF  ANALYSIS 


Each  of  the  books  was  recorded  into  a  file  of  its  own  in 
a  fashion  which  changed  it  as  little  as  possible  from  the 
original.  Non-alphabetic  and  graphic  symbols  were  repre¬ 
sented  by  their  closest  ASCII  equivalent,  if  there  was  one. 
Otherwise  an  alternate  such  as  "3"  was  chosen.  Statistical 
analysis  was  performed  on  these  files  but  is  not  included 
because  it  included  lower-case  letters  and  a  large  number  of 
spaces  (used  for  formatting).  It  was  felt  that  these  condi¬ 
tions  made  these  first  sets  of  files  inappropriate  for  use 
with  the  PDBMS. 

After  the  above  files  had  been  created,  the  fries  were 
than  copied  to  another  set  of  files.  In  transferring  the 
data,  all  lower-case  letters  were  converted  to  upper-case 
and  multiple  spaces  were  removed.  Tables  VIII,  X,  XI,  XV, 
XVI,  and  XVII  present  the  results  of  the  analysis  of  these 
files. 

Finally  this  second  set  of  files  was  copied  to  a  *hird 
set  using  a  transformation  which  was  designed  to  reduce  the 
ska  wedness  of  the  letter  and  digit  distributions.  This  was 
done  at  a  time  when  it  had  not  yet  been  decided  not  to  use 
text  compression.  Many  text  compression  technigues  require 
knowledge  of  the  distribution  of  the  symbols.  It  was  hoped 
that  something  close  to  the  letter  distribution  of  standard 
English  would  be  obtained.  The  tables  which  use  the  laoel 
"After”  reflect  the  data  gained  from  analyzing  this  last  set 
of  files.  The  distribution  of  the  letter  frequencies  for 
English  were  gotten  from  ref  arence  [  14  ].  What  follows  are 
the  rules  applied  to  the  second  set  of  files  to  produce  the 
third  set.  They  are  listad  in  the  order  in  which  they  were 
applied. 

•  Remove  all  redundant  surnames. 
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•  Beaove  all  redundant  city  naaes  for  cities  in  the  saiae 
state.  Any  fora  of  the  naae  is  reaoved  (including 
abbreviations)  leaving  the  longest  fora. 

•  Beaove  all  redundant  zip  codes. 

•  Remove  all  redundant  telephone  exchange  nuabers  within 
the  saae  area  code. 

•  Beaove  all  area  codes  and  state  naaes. 


•  Beaove  the  first  three  digits  of  each  zip  code 
reaaining.  Thgse  digits  indigate  the  post  office's 
geographical  region  (tne  first  digit)  .and  aajor  city  or 
distribution  point  (second  and  third  digits) . 


The  data  in  the  first  and  second  sets  of  filesf  though 
obviously  address  booh  data,  could  not  be  used  as  a  repre¬ 
sentative  saaple  of  the  "average"  American  address  book. 
For  exaaple,  310  (6  percant)  of  the  wordds  in  the  address 
books  refer  to  the  states  of  California,  Maryland,  North 
Carolina,  Mew  York,  Virginia,  and  Washington.  This  would 
probably  serve  as  a  poor  basis  for  predicting  the  contents 
of  the  address  book  of  soseone  living  in  Chicago.  For  this 
reason  the  above  transf oraation  was  used  in  an  atteept  to 
resove  the  influence  of  faaily  naaes  and  geographical  loca¬ 
tions  froa  the  data  yielding  a  saaple  aore  representative  of 
an  "average"  address  book.  Because  the  PDBHS  is  not 
designed  to  handle  only  one  specific  person's  inforaation, 
an  average  address  book  was  needed  in  order  to  deteraine  the 
utility  of  algorithas  and  data  structures.  If  the  address 
books  had  been  found  to  contain  alaost  no  redundancies,  then 
the  idea  of  using  a  DB  dictionary  probably  would  have  been 
discarded. 


C.  RESULTS  OF  THE  AHALISIS 


In  the  tables  appearing  in  this  appendix,  the  words 
"wordd,"  "char,”  and  "punctuation"  are  used  to  connote  the 
definitions  ascribed  to  thee  in  Table  I.  The  word  "char¬ 
acter"  is  used  to  Bean  all  printing  ASCII  characters  and  the 
space.  ill  percentages,  except  those  in  Table  Z,  reflect 
the  percentage  of  a 3,1  characters. 

The  difference  between  the  nunber  of  unique  wordds 
in  Tables  7III  and  IX  is  a  result  of  the  reduction  of  zip 
codes  to  their  last  two  digits.  The  differences  are  equal 
to  the  nuaber  of  unique  zip  codes.  Also  notice  that  the  sua 
of  the  unique  wordds  in  the  four  books  is  not  equal  to  the 
nuaber  in  the  total  coluan.  This  is  because  the  total  shown 
is  the  nuaber  of  unique  wordds  in  all  four  books  as  a  whole. 
Lastly,  the  reduction  of  the  nuaber  of  characters  includes 
not  only  those  chars  in  the  deleted  wordds,  but  also  the 
punctuation  following  the  ends  of  and  between  the  wordds 
deleted  during  the  creation  of  the  third  set  of  files. 

2.  &£& 

Table  X  indicates  that  the  PDBHS,  as  it  is  designed, 
is  not  as  efficient  with  aeaory,  when  coapared  to  a  systea 
which  siaply  inserted  plain  text  (i.  e.,  did  not  use  a  DB 
dictionary,  etc.) .  Between  the  DB  dictionary  and  the 
logical  records,  every  unique  wordd  in  the  PDBHS  requires  at 
least  nine  bytes  (seven  for  the  DB  dictionary  entry  and  two 
in  the  logical  record).  Bcrdds  which  are  duplicates  of 
wordds  previously  entered  into  the  PDBHS  require  five  bytes 
(three  in  the  DB  dictionary  used  for  the  field  ID  and  the 
pointer  to  the  physical  record,  and  two  in  the  logical 
record  used  for  the  first  letter  of  the  wordd  and  the 
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Book  1 

Book  2 

Book  3 

Book  4 

Total 

Records 

80 

129 

88 

111 

408 

Fields 

340 

472 

346 

350 

1508 

Characters 

6173 

8409 

5908 

6248 

26738 

Chars 

— 

6  639 

m 

■a 

21660 

Bordds 

wBM 

1  579 

BB 

■9 

4961 

Unique 

Bor  ads 

749 

958 

■9 

723 

3170 

liBLB  IX 

General  Statistics  -  After 


vordd  length  in  the  foar  books  is  4.37  chars.  In  order  to 
be  better  than  or  equivalent  to  a  systea  using  plain  text  in 


table  z 

Hordd  Length  Distribution 


Sordd 

Length 


P  reguency 


6.25 

14.67 

18.93 

16.13 

18.87 

8.61 
7.01 
4.90 
2.  34 
1.23 

0.73 


records  requires  highly  redundant  information.  The  four 
books  together  require  approxiaately  34 K  bytes  of  storage  as 
plain  text  (this  includes  administrative  overhead).  However 
this  does  not  include  the  storage  required  for  indices 
needed  to  provide  random  access;  only  sequential  access  is 
possible  with  only  34K  bytes  of  storage.  Based  upon  the 
data  derived  froa  the  four  books,  the  PDBflS  woulu  require 
approxiaately  45K  bytes  to  store  the  same  information  (27K 
bytes  for  the  dictionary  and  18K  for  the  files;  again 
including  administrative  overhead).  However,  unlike  the  34K 

bytes  above,  this  45K  bytes  includes  storage  dedicated  to 

\ 

providing  ran  do  a  access. 


Book  1  %  |  Book  2  %  I  Book 


Book  1  I  ft  |  Book  2  1  «  I  Book  3  *  Book 
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•ri 

4* 
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«H 

4* 
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a 

m  o 

«•  *kI 
H  4* 

0 


£ 
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Tables  XI,  XII,  XIII,  XIV,  XV,  and  XVI  present  data 
on  the  syebols  found  in  the  four  address  books.  Hotice  froa 
Table  XVI  that  it  is  obvious  that  these  books  are  not 
saaples  froa  noraal  English  text.  For  the  aost  part,  the 
books  are  "fairly  unifora "  in  their  use  of  letters  and 
digits;  this  is  not  the  case  with  punctuation.  Book  1  is 
distinctive  in  that  it  is  the  only  one  where  a  dollar  sign, 
colons,  and  seaicolons  appear.  Book  2  uses  an  unusually 
large  nuaber  of  "other"  punctuation  characters.  These  punc¬ 
tuation  characters  are  those  which  were  used  to  represent 
graphic,  non- alphabetic  syabols.  Book  4  is  unlike  the 
others  in  that  it  uses  the  plus  sign  as  the  abbreviation  for 
the  word  "and"  whereas  the  other  books  use  the  aapersand. 
Book  4  also  contains  a  relatively  snail  nuaber  of  paren¬ 
theses,  dashes,  periods,  and  "others"  coapared  to  the  other 
books. 

<».  initial  iaiisis 

Tables  XVII  and  XV III  show  the  distribution  of  all 
alphabetic  wordds  in  the  four  books  as  a  whole  by  their 
first  letter.  What  is  shown  in  tha  "dost  Frequent  Wordds" 
coluan  are  those  wordds  which  account  for  approxiaately  30 
percent  of  the  total  nuaber  of  wordds  starting  with  the 
letter  in  the  corresponding  first  coluan.  Notice  that 
surnaaes,  cities,  and  states  do  not  appear  in  Table  XVIII 
because  all  but  one  occurrence  of  then  reaains  in  the  third 
set  of  files.  One  noticeable  excaption  is  the  towns  of 
Westainster.  The  wordd  appears  in  Table  XVIII  because  three 
different  towns  occur  in  the  four  different  books 
(Westainster,  California;  Westainster,  Colorado;  and 
Westainster,  Haryland) .  As  proof  of  the  skewed  nature  of 
inforaation  notice  the  large  nuaber  of  occurrences  of  the 


TA  BL2  ZVZ 

Co*  pari  sob  with  Standard  English 
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B 

C 

0 

2 


P 

6 

H 

I 

J 


K 

L 

a 

N 

0 


S 

T 


0 

V 

w 

X 

I 

z 


Before 


Observed 


418.00 

107.75 

180.25 

Mill 


68!  50 
33.25 


66.25 

234.00 

141.75 

297.25 

286.25 


M 

234! 2? 


Expected 


322.76 
60.  52 
121.  04 
161.  38 
524.  49 


80.  69 
60.  52 

223:  K 

20.  17 


20.  17 
141. 21 
121. 04 
282.  42 
322.  76 


80.69 


84.75 

71.00 

1:9 

11.50 


121.  04 
40.  35 
60.52 
20.  17 
80.69 
10.  09 


After 


Observed 


332.25 
95.25 
131.50 
131.75 
36  2.50 


40.75 

58.75 
110.25 
194.75 

33.00 


^  54.25 

200.25 

117.25 
246.00 
24  9.00 


73.25 

29l*.  00 

294.00 

20  0,00 


77.00 

63.50 
49.25 

22.50 
68.75 

9.25 


Expected 


273.98 
51.37 

102.  74 

136.99 
445.22 


68.50 
51.37 
5.49 
2.61 
17.  12 


E2 


17.  12 
119.87 
102.74 
239.73 
273.98 


68.50 

8.56 

222.61 

205.49 

308.23 


102.74 

34.25 

51.37 

17.12 

68.50 

8.56 


X*  Statistic  Before:  466.89 
Statistic  After:  387.44 


abbreviations  for  the  states  of  California  (CA) ,  North 
Carolina  <HC) ,  New  York  (NY),  and  Nashington  (NA) .  The 
large  nuaber  of  P's  and  O's  can  be  accoanted  for  by  the 
large  naaber  of  occurrences  of  the  ovord  "P.O."  as  an  abbre¬ 
viation  for  post  gffice. 


These  two  tables  also  suppoct  the  preaise  that  these 
address  books  are  not  froa  noraal  English  text.  The  English 
words  "THE,1'  "OF,"  and  "AH  D"  aake  up  13.75  percent  of  all 
words  in  English  text.  These  saae  words  aake  up  less  than 
one  percent  of  the  wordds  in  the  address  books.  in  fact, 
less  than  one  percent  of  the  wordds  in  the  four  address 
books  are  the  46  aost  frequently  occurring  words  in  the 
English  language.  These  46  words  account  for  sore  than  41 
percent  of  all  words  in  English  text  [15]. 
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TABLE  XYII 

Initial  Letters  of  iordds  -  Before 
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NO.  Of 
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(Iordds 

Total 

No.  af 
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Vardas 

Count 
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71 
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APT 
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BOX 
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CA 
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C 

18 

CO 
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CT 
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7 
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5 
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59 
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3 

H 

73 
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3 

I 

21 

36 
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5 
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5 

I 

3 

J 

54 
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JOHN 
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J 
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KB 
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36 

63 
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5 
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5 

KT 

5 

L 

72 
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LANE 

10 

- 
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LT 

1 

107 


TABLE  XTII 

continued 
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No.  of 
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Count 
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■ 

MOBEHEAD 

12 

N 

56 

232 

NC 

51 

NT 

51 
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13 

o 
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41 

H 
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10 
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37 
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3 

3 

I 
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40 
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16 
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4 
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TIB LB  XVIII 

Initial  Letters  of  Bordds  -  if  tar 
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TIB LB  XVIII 

continued 
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Unique 
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Rordds 
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